Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 6.178
Filter
Add more filters

Publication year range
1.
Cell ; 187(10): 2502-2520.e17, 2024 May 09.
Article in English | MEDLINE | ID: mdl-38729110

ABSTRACT

Human tissue, which is inherently three-dimensional (3D), is traditionally examined through standard-of-care histopathology as limited two-dimensional (2D) cross-sections that can insufficiently represent the tissue due to sampling bias. To holistically characterize histomorphology, 3D imaging modalities have been developed, but clinical translation is hampered by complex manual evaluation and lack of computational platforms to distill clinical insights from large, high-resolution datasets. We present TriPath, a deep-learning platform for processing tissue volumes and efficiently predicting clinical outcomes based on 3D morphological features. Recurrence risk-stratification models were trained on prostate cancer specimens imaged with open-top light-sheet microscopy or microcomputed tomography. By comprehensively capturing 3D morphologies, 3D volume-based prognostication achieves superior performance to traditional 2D slice-based approaches, including clinical/histopathological baselines from six certified genitourinary pathologists. Incorporating greater tissue volume improves prognostic performance and mitigates risk prediction variability from sampling bias, further emphasizing the value of capturing larger extents of heterogeneous morphology.


Subject(s)
Imaging, Three-Dimensional , Prostatic Neoplasms , Supervised Machine Learning , Humans , Male , Deep Learning , Imaging, Three-Dimensional/methods , Prognosis , Prostatic Neoplasms/pathology , Prostatic Neoplasms/diagnostic imaging , X-Ray Microtomography/methods
2.
Cell ; 187(3): 526-544, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38306980

ABSTRACT

Methods from artificial intelligence (AI) trained on large datasets of sequences and structures can now "write" proteins with new shapes and molecular functions de novo, without starting from proteins found in nature. In this Perspective, I will discuss the state of the field of de novo protein design at the juncture of physics-based modeling approaches and AI. New protein folds and higher-order assemblies can be designed with considerable experimental success rates, and difficult problems requiring tunable control over protein conformations and precise shape complementarity for molecular recognition are coming into reach. Emerging approaches incorporate engineering principles-tunability, controllability, and modularity-into the design process from the beginning. Exciting frontiers lie in deconstructing cellular functions with de novo proteins and, conversely, constructing synthetic cellular signaling from the ground up. As methods improve, many more challenges are unsolved.


Subject(s)
Artificial Intelligence , Proteins , Protein Conformation , Proteins/chemistry , Proteins/metabolism , Protein Engineering , Deep Learning
3.
Cell ; 187(6): 1490-1507.e21, 2024 Mar 14.
Article in English | MEDLINE | ID: mdl-38452761

ABSTRACT

Cell cycle progression relies on coordinated changes in the composition and subcellular localization of the proteome. By applying two distinct convolutional neural networks on images of millions of live yeast cells, we resolved proteome-level dynamics in both concentration and localization during the cell cycle, with resolution of ∼20 subcellular localization classes. We show that a quarter of the proteome displays cell cycle periodicity, with proteins tending to be controlled either at the level of localization or concentration, but not both. Distinct levels of protein regulation are preferentially utilized for different aspects of the cell cycle, with changes in protein concentration being mostly involved in cell cycle control and changes in protein localization in the biophysical implementation of the cell cycle program. We present a resource for exploring global proteome dynamics during the cell cycle, which will aid in understanding a fundamental biological process at a systems level.


Subject(s)
Saccharomyces cerevisiae Proteins , Saccharomyces cerevisiae , Eukaryotic Cells/metabolism , Neural Networks, Computer , Proteome/metabolism , Saccharomyces cerevisiae/cytology , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/metabolism
4.
Immunity ; 2024 Aug 15.
Article in English | MEDLINE | ID: mdl-39163866

ABSTRACT

Despite decades of antibody research, it remains challenging to predict the specificity of an antibody solely based on its sequence. Two major obstacles are the lack of appropriate models and the inaccessibility of datasets for model training. In this study, we curated >5,000 influenza hemagglutinin (HA) antibodies by mining research publications and patents, which revealed many distinct sequence features between antibodies to HA head and stem domains. We then leveraged this dataset to develop a lightweight memory B cell language model (mBLM) for sequence-based antibody specificity prediction. Model explainability analysis showed that mBLM could identify key sequence features of HA stem antibodies. Additionally, by applying mBLM to HA antibodies with unknown epitopes, we discovered and experimentally validated many HA stem antibodies. Overall, this study not only advances our molecular understanding of the antibody response to the influenza virus but also provides a valuable resource for applying deep learning to antibody research.

5.
Trends Genet ; 2024 Aug 07.
Article in English | MEDLINE | ID: mdl-39117482

ABSTRACT

Harnessing cutting-edge technologies to enhance crop productivity is a pivotal goal in modern plant breeding. Artificial intelligence (AI) is renowned for its prowess in big data analysis and pattern recognition, and is revolutionizing numerous scientific domains including plant breeding. We explore the wider potential of AI tools in various facets of breeding, including data collection, unlocking genetic diversity within genebanks, and bridging the genotype-phenotype gap to facilitate crop breeding. This will enable the development of crop cultivars tailored to the projected future environments. Moreover, AI tools also hold promise for refining crop traits by improving the precision of gene-editing systems and predicting the potential effects of gene variants on plant phenotypes. Leveraging AI-enabled precision breeding can augment the efficiency of breeding programs and holds promise for optimizing cropping systems at the grassroots level. This entails identifying optimal inter-cropping and crop-rotation models to enhance agricultural sustainability and productivity in the field.

6.
Am J Hum Genet ; 2024 Jul 31.
Article in English | MEDLINE | ID: mdl-39146935

ABSTRACT

Large language models (LLMs) are generating interest in medical settings. For example, LLMs can respond coherently to medical queries by providing plausible differential diagnoses based on clinical notes. However, there are many questions to explore, such as evaluating differences between open- and closed-source LLMs as well as LLM performance on queries from both medical and non-medical users. In this study, we assessed multiple LLMs, including Llama-2-chat, Vicuna, Medllama2, Bard/Gemini, Claude, ChatGPT3.5, and ChatGPT-4, as well as non-LLM approaches (Google search and Phenomizer) regarding their ability to identify genetic conditions from textbook-like clinician questions and their corresponding layperson translations related to 63 genetic conditions. For open-source LLMs, larger models were more accurate than smaller LLMs: 7b, 13b, and larger than 33b parameter models obtained accuracy ranges from 21%-49%, 41%-51%, and 54%-68%, respectively. Closed-source LLMs outperformed open-source LLMs, with ChatGPT-4 performing best (89%-90%). Three of 11 LLMs and Google search had significant performance gaps between clinician and layperson prompts. We also evaluated how in-context prompting and keyword removal affected open-source LLM performance. Models were provided with 2 types of in-context prompts: list-type prompts, which improved LLM performance, and definition-type prompts, which did not. We further analyzed removal of rare terms from descriptions, which decreased accuracy for 5 of 7 evaluated LLMs. Finally, we observed much lower performance with real individuals' descriptions; LLMs answered these questions with a maximum 21% accuracy.

7.
Development ; 2024 Aug 23.
Article in English | MEDLINE | ID: mdl-39177163

ABSTRACT

One of the key tissue movements driving closure of a wound is re-epithelialisation. Earlier wound healing studies have described the dynamic cell behaviours that contribute to wound re-epithelialisation, including cell division, cell shape changes and cell migration, as well as the signals that might regulate these cell behaviours. Here, we use a series of deep learning tools to quantify the contributions of each of these cell behaviours from movies of repairing wounds in the Drosophila pupal wing epithelium. We test how each is altered following knockdown of the conserved wound repair signals, Ca2+ and JNK, as well as ablation of macrophages which supply growth factor signals believed to orchestrate aspects of the repair process. Our genetic perturbation experiments provide quantifiable insights regarding how these wound signals impact cell behaviours. We find that Ca2+ signalling is a master regulator required for all contributing cell behaviours; JNK signalling primarily drives cell shape changes and divisions, whereas signals from macrophages regulate largely cell migration and proliferation. Our studies show AI to be a valuable tool for unravelling complex signalling hierarchies underlying tissue repair.

8.
Development ; 151(20)2024 Oct 15.
Article in English | MEDLINE | ID: mdl-38619327

ABSTRACT

Tissue morphogenesis is intimately linked to the changes in shape and organisation of individual cells. In curved epithelia, cells can intercalate along their own apicobasal axes, adopting a shape named 'scutoid' that allows energy minimization in the tissue. Although several geometric and biophysical factors have been associated with this 3D reorganisation, the dynamic changes underlying scutoid formation in 3D epithelial packing remain poorly understood. Here, we use live imaging of the sea star embryo coupled with deep learning-based segmentation to dissect the relative contributions of cell density, tissue compaction and cell proliferation on epithelial architecture. We find that tissue compaction, which naturally occurs in the embryo, is necessary for the appearance of scutoids. Physical compression experiments identify cell density as the factor promoting scutoid formation at a global level. Finally, the comparison of the developing embryo with computational models indicates that the increase in the proportion of scutoids is directly associated with cell divisions. Our results suggest that apico-basal intercalations appearing immediately after mitosis may help accommodate the new cells within the tissue. We propose that proliferation in a compact epithelium induces 3D cell rearrangements during development.


Subject(s)
Cell Proliferation , Embryo, Nonmammalian , Morphogenesis , Animals , Epithelium , Embryo, Nonmammalian/cytology , Cell Count , Starfish/embryology , Epithelial Cells/cytology , Epithelial Cells/metabolism , Cell Division
9.
Proc Natl Acad Sci U S A ; 121(12): e2314600121, 2024 Mar 19.
Article in English | MEDLINE | ID: mdl-38470920

ABSTRACT

Global atmospheric methane concentrations rose by 10 to 15 ppb/y in the 1980s before abruptly slowing to 2 to 8 ppb/y in the early 1990s. This period in the 1990s is known as the "methane slowdown" and has been attributed in part to the collapse of the former Soviet Union (USSR) in December 1991, which may have decreased the methane emissions from oil and gas operations. Here, we develop a methane plume detection system based on probabilistic deep learning and human-labeled training data. We use this method to detect methane plumes from Landsat 5 satellite observations over Turkmenistan from 1986 to 2011. We focus on Turkmenistan because economic data suggest it could account for half of the decline in oil and gas emissions from the former USSR. We find an increase in both the frequency of methane plume detections and the magnitude of methane emissions following the collapse of the USSR. We estimate a national loss rate from oil and gas infrastructure in Turkmenistan of more than 10% at times, which suggests the socioeconomic turmoil led to a lack of oversight and widespread infrastructure failure in the oil and gas sector. Our finding of increased oil and gas methane emissions from Turkmenistan following the USSR's collapse casts doubt on the long-standing hypothesis regarding the methane slowdown, begging the question: "what drove the 1992 methane slowdown?"

10.
Proc Natl Acad Sci U S A ; 121(12): e2310002121, 2024 Mar 19.
Article in English | MEDLINE | ID: mdl-38470929

ABSTRACT

We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories, but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.

11.
Proc Natl Acad Sci U S A ; 121(27): e2311808121, 2024 Jul 02.
Article in English | MEDLINE | ID: mdl-38913886

ABSTRACT

Modeling complex physical dynamics is a fundamental task in science and engineering. Traditional physics-based models are first-principled, explainable, and sample-efficient. However, they often rely on strong modeling assumptions and expensive numerical integration, requiring significant computational resources and domain expertise. While deep learning (DL) provides efficient alternatives for modeling complex dynamics, they require a large amount of labeled training data. Furthermore, its predictions may disobey the governing physical laws and are difficult to interpret. Physics-guided DL aims to integrate first-principled physical knowledge into data-driven methods. It has the best of both worlds and is well equipped to better solve scientific problems. Recently, this field has gained great progress and has drawn considerable interest across discipline Here, we introduce the framework of physics-guided DL with a special emphasis on learning dynamical systems. We describe the learning pipeline and categorize state-of-the-art methods under this framework. We also offer our perspectives on the open challenges and emerging opportunities.

12.
Proc Natl Acad Sci U S A ; 121(26): e2319811121, 2024 Jun 25.
Article in English | MEDLINE | ID: mdl-38889146

ABSTRACT

Rational design of plant cis-regulatory DNA sequences without expert intervention or prior domain knowledge is still a daunting task. Here, we developed PhytoExpr, a deep learning framework capable of predicting both mRNA abundance and plant species using the proximal regulatory sequence as the sole input. PhytoExpr was trained over 17 species representative of major clades of the plant kingdom to enhance its generalizability. Via input perturbation, quantitative functional annotation of the input sequence was achieved at single-nucleotide resolution, revealing an abundance of predicted high-impact nucleotides in conserved noncoding sequences and transcription factor binding sites. Evaluation of maize HapMap3 single-nucleotide polymorphisms (SNPs) by PhytoExpr demonstrates an enrichment of predicted high-impact SNPs in cis-eQTL. Additionally, we provided two algorithms that harnessed the power of PhytoExpr in designing functional cis-regulatory variants, and de novo creation of species-specific cis-regulatory sequences through in silico evolution of random DNA sequences. Our model represents a general and robust approach for functional variant discovery in population genetics and rational design of regulatory sequences for genome editing and synthetic biology.


Subject(s)
Polymorphism, Single Nucleotide , Regulatory Sequences, Nucleic Acid , Zea mays , Regulatory Sequences, Nucleic Acid/genetics , Zea mays/genetics , Quantitative Trait Loci , Algorithms , Gene Expression Regulation, Plant , Deep Learning , Plants/genetics , Transcription Factors/genetics , Transcription Factors/metabolism , Models, Genetic , Genes, Plant , Binding Sites/genetics
13.
Proc Natl Acad Sci U S A ; 121(6): e2314853121, 2024 Feb 06.
Article in English | MEDLINE | ID: mdl-38285937

ABSTRACT

Amino acid mutations that lower a protein's thermodynamic stability are implicated in numerous diseases, and engineered proteins with enhanced stability can be important in research and medicine. Computational methods for predicting how mutations perturb protein stability are, therefore, of great interest. Despite recent advancements in protein design using deep learning, in silico prediction of stability changes has remained challenging, in part due to a lack of large, high-quality training datasets for model development. Here, we describe ThermoMPNN, a deep neural network trained to predict stability changes for protein point mutations given an initial structure. In doing so, we demonstrate the utility of a recently released megascale stability dataset for training a robust stability model. We also employ transfer learning to leverage a second, larger dataset by using learned features extracted from ProteinMPNN, a deep neural network trained to predict a protein's amino acid sequence given its three-dimensional structure. We show that our method achieves state-of-the-art performance on established benchmark datasets using a lightweight model architecture that allows for rapid, scalable predictions. Finally, we make ThermoMPNN readily available as a tool for stability prediction and design.


Subject(s)
Neural Networks, Computer , Proteins , Proteins/genetics , Proteins/chemistry , Amino Acid Sequence , Protein Stability , Machine Learning
14.
Proc Natl Acad Sci U S A ; 121(9): e2309624121, 2024 Feb 27.
Article in English | MEDLINE | ID: mdl-38381782

ABSTRACT

We propose Multiscale Flow, a generative Normalizing Flow that creates samples and models the field-level likelihood of two-dimensional cosmological data such as weak lensing. Multiscale Flow uses hierarchical decomposition of cosmological fields via a wavelet basis and then models different wavelet components separately as Normalizing Flows. The log-likelihood of the original cosmological field can be recovered by summing over the log-likelihood of each wavelet term. This decomposition allows us to separate the information from different scales and identify distribution shifts in the data such as unknown scale-dependent systematics. The resulting likelihood analysis can not only identify these types of systematics, but can also be made optimal, in the sense that the Multiscale Flow can learn the full likelihood at the field without any dimensionality reduction. We apply Multiscale Flow to weak lensing mock datasets for cosmological inference and show that it significantly outperforms traditional summary statistics such as power spectrum and peak counts, as well as machine learning-based summary statistics such as scattering transform and convolutional neural networks. We further show that Multiscale Flow is able to identify distribution shifts not in the training data such as baryonic effects. Finally, we demonstrate that Multiscale Flow can be used to generate realistic samples of weak lensing data.

15.
Proc Natl Acad Sci U S A ; 121(35): e2410662121, 2024 Aug 27.
Article in English | MEDLINE | ID: mdl-39163334

ABSTRACT

Proteins perform their biological functions through motion. Although high throughput prediction of the three-dimensional static structures of proteins has proved feasible using deep-learning-based methods, predicting the conformational motions remains a challenge. Purely data-driven machine learning methods encounter difficulty for addressing such motions because available laboratory data on conformational motions are still limited. In this work, we develop a method for generating protein allosteric motions by integrating physical energy landscape information into deep-learning-based methods. We show that local energetic frustration, which represents a quantification of the local features of the energy landscape governing protein allosteric dynamics, can be utilized to empower AlphaFold2 (AF2) to predict protein conformational motions. Starting from ground state static structures, this integrative method generates alternative structures as well as pathways of protein conformational motions, using a progressive enhancement of the energetic frustration features in the input multiple sequence alignment sequences. For a model protein adenylate kinase, we show that the generated conformational motions are consistent with available experimental and molecular dynamics simulation data. Applying the method to another two proteins KaiB and ribose-binding protein, which involve large-amplitude conformational changes, can also successfully generate the alternative conformations. We also show how to extract overall features of the AF2 energy landscape topography, which has been considered by many to be black box. Incorporating physical knowledge into deep-learning-based structure prediction algorithms provides a useful strategy to address the challenges of dynamic structure prediction of allosteric proteins.


Subject(s)
Molecular Dynamics Simulation , Protein Conformation , Proteins/chemistry , Adenylate Kinase/chemistry , Adenylate Kinase/metabolism , Allosteric Regulation , Deep Learning
16.
Proc Natl Acad Sci U S A ; 121(6): e2313360121, 2024 02 06.
Article in English | MEDLINE | ID: mdl-38294935

ABSTRACT

A central challenge in the study of intrinsically disordered proteins is the characterization of the mechanisms by which they bind their physiological interaction partners. Here, we utilize a deep learning-based Markov state modeling approach to characterize the folding-upon-binding pathways observed in a long timescale molecular dynamics simulation of a disordered region of the measles virus nucleoprotein NTAIL reversibly binding the X domain of the measles virus phosphoprotein complex. We find that folding-upon-binding predominantly occurs via two distinct encounter complexes that are differentiated by the binding orientation, helical content, and conformational heterogeneity of NTAIL. We observe that folding-upon-binding predominantly proceeds through a multi-step induced fit mechanism with several intermediates and do not find evidence for the existence of canonical conformational selection pathways. We observe four kinetically separated native-like bound states that interconvert on timescales of eighty to five hundred nanoseconds. These bound states share a core set of native intermolecular contacts and stable NTAIL helices and are differentiated by a sequential formation of native and non-native contacts and additional helical turns. Our analyses provide an atomic resolution structural description of intermediate states in a folding-upon-binding pathway and elucidate the nature of the kinetic barriers between metastable states in a dynamic and heterogenous, or "fuzzy", protein complex.


Subject(s)
Intrinsically Disordered Proteins , Intrinsically Disordered Proteins/chemistry , Protein Folding , Protein Binding , Molecular Dynamics Simulation
17.
Proc Natl Acad Sci U S A ; 121(6): e2300838121, 2024 Feb 06.
Article in English | MEDLINE | ID: mdl-38300863

ABSTRACT

Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from its sequence or structure remains a major challenge. Here, we introduce holographic convolutional neural network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein stability and binding of protein complexes. Our interpretable computational model for protein structure-function maps could guide design of novel proteins with desired function.


Subject(s)
Algorithms , Neural Networks, Computer , Proteins/genetics , Machine Learning , Amino Acids
18.
J Cell Sci ; 137(4)2024 02 15.
Article in English | MEDLINE | ID: mdl-38264939

ABSTRACT

Filopodia are slender, actin-filled membrane projections used by various cell types for environment exploration. Analyzing filopodia often involves visualizing them using actin, filopodia tip or membrane markers. Due to the diversity of cell types that extend filopodia, from amoeboid to mammalian, it can be challenging for some to find a reliable filopodia analysis workflow suited for their cell type and preferred visualization method. The lack of an automated workflow capable of analyzing amoeboid filopodia with only a filopodia tip label prompted the development of filoVision. filoVision is an adaptable deep learning platform featuring the tools filoTips and filoSkeleton. filoTips labels filopodia tips and the cytosol using a single tip marker, allowing information extraction without actin or membrane markers. In contrast, filoSkeleton combines tip marker signals with actin labeling for a more comprehensive analysis of filopodia shafts in addition to tip protein analysis. The ZeroCostDL4Mic deep learning framework facilitates accessibility and customization for different datasets and cell types, making filoVision a flexible tool for automated analysis of tip-marked filopodia across various cell types and user data.


Subject(s)
Actins , Deep Learning , Animals , Actins/metabolism , Pseudopodia/metabolism , Mammals/metabolism
19.
J Cell Sci ; 137(3)2024 02 01.
Article in English | MEDLINE | ID: mdl-38324353

ABSTRACT

Fluorescence microscopy is essential for studying living cells, tissues and organisms. However, the fluorescent light that switches on fluorescent molecules also harms the samples, jeopardizing the validity of results - particularly in techniques such as super-resolution microscopy, which demands extended illumination. Artificial intelligence (AI)-enabled software capable of denoising, image restoration, temporal interpolation or cross-modal style transfer has great potential to rescue live imaging data and limit photodamage. Yet we believe the focus should be on maintaining light-induced damage at levels that preserve natural cell behaviour. In this Opinion piece, we argue that a shift in role for AIs is needed - AI should be used to extract rich insights from gentle imaging rather than recover compromised data from harsh illumination. Although AI can enhance imaging, our ultimate goal should be to uncover biological truths, not just retrieve data. It is essential to prioritize minimizing photodamage over merely pushing technical limits. Our approach is aimed towards gentle acquisition and observation of undisturbed living systems, aligning with the essence of live-cell fluorescence microscopy.


Subject(s)
Artificial Intelligence , Software , Microscopy, Fluorescence
20.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38600666

ABSTRACT

Predicting the drug response of cancer cell lines is crucial for advancing personalized cancer treatment, yet remains challenging due to tumor heterogeneity and individual diversity. In this study, we present a deep learning-based framework named Deep neural network Integrating Prior Knowledge (DIPK) (DIPK), which adopts self-supervised techniques to integrate multiple valuable information, including gene interaction relationships, gene expression profiles and molecular topologies, to enhance prediction accuracy and robustness. We demonstrated the superior performance of DIPK compared to existing methods on both known and novel cells and drugs, underscoring the importance of gene interaction relationships in drug response prediction. In addition, DIPK extends its applicability to single-cell RNA sequencing data, showcasing its capability for single-cell-level response prediction and cell identification. Further, we assess the applicability of DIPK on clinical data. DIPK accurately predicted a higher response to paclitaxel in the pathological complete response (pCR) group compared to the residual disease group, affirming the better response of the pCR group to the chemotherapy compound. We believe that the integration of DIPK into clinical decision-making processes has the potential to enhance individualized treatment strategies for cancer patients.


Subject(s)
Deep Learning , Neoplasms , Humans , Neural Networks, Computer , Neoplasms/drug therapy , Neoplasms/genetics , Cell Line
SELECTION OF CITATIONS
SEARCH DETAIL