Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28.325
Filter
Add more filters

Publication year range
1.
Cell ; 2024 Oct 03.
Article in English | MEDLINE | ID: mdl-39389057

ABSTRACT

Current metagenomic tools can fail to identify highly divergent RNA viruses. We developed a deep learning algorithm, termed LucaProt, to discover highly divergent RNA-dependent RNA polymerase (RdRP) sequences in 10,487 metatranscriptomes generated from diverse global ecosystems. LucaProt integrates both sequence and predicted structural information, enabling the accurate detection of RdRP sequences. Using this approach, we identified 161,979 potential RNA virus species and 180 RNA virus supergroups, including many previously poorly studied groups, as well as RNA virus genomes of exceptional length (up to 47,250 nucleotides) and genomic complexity. A subset of these novel RNA viruses was confirmed by RT-PCR and RNA/DNA sequencing. Newly discovered RNA viruses were present in diverse environments, including air, hot springs, and hydrothermal vents, with virus diversity and abundance varying substantially among ecosystems. This study advances virus discovery, highlights the scale of the virosphere, and provides computational tools to better document the global RNA virome.

2.
Cell ; 187(3): 526-544, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38306980

ABSTRACT

Methods from artificial intelligence (AI) trained on large datasets of sequences and structures can now "write" proteins with new shapes and molecular functions de novo, without starting from proteins found in nature. In this Perspective, I will discuss the state of the field of de novo protein design at the juncture of physics-based modeling approaches and AI. New protein folds and higher-order assemblies can be designed with considerable experimental success rates, and difficult problems requiring tunable control over protein conformations and precise shape complementarity for molecular recognition are coming into reach. Emerging approaches incorporate engineering principles-tunability, controllability, and modularity-into the design process from the beginning. Exciting frontiers lie in deconstructing cellular functions with de novo proteins and, conversely, constructing synthetic cellular signaling from the ground up. As methods improve, many more challenges are unsolved.


Subject(s)
Artificial Intelligence , Proteins , Protein Conformation , Proteins/chemistry , Proteins/metabolism , Protein Engineering , Deep Learning
3.
Cell ; 186(22): 4868-4884.e12, 2023 10 26.
Article in English | MEDLINE | ID: mdl-37863056

ABSTRACT

Single-cell analysis in living humans is essential for understanding disease mechanisms, but it is impractical in non-regenerative organs, such as the eye and brain, because tissue biopsies would cause serious damage. We resolve this problem by integrating proteomics of liquid biopsies with single-cell transcriptomics from all known ocular cell types to trace the cellular origin of 5,953 proteins detected in the aqueous humor. We identified hundreds of cell-specific protein markers, including for individual retinal cell types. Surprisingly, our results reveal that retinal degeneration occurs in Parkinson's disease, and the cells driving diabetic retinopathy switch with disease stage. Finally, we developed artificial intelligence (AI) models to assess individual cellular aging and found that many eye diseases not associated with chronological age undergo accelerated molecular aging of disease-specific cell types. Our approach, which can be applied to other organ systems, has the potential to transform molecular diagnostics and prognostics while uncovering new cellular disease and aging mechanisms.


Subject(s)
Aging , Aqueous Humor , Artificial Intelligence , Liquid Biopsy , Proteomics , Humans , Aging/metabolism , Aqueous Humor/chemistry , Biopsy , Parkinson Disease/diagnosis
4.
Cell ; 186(7): 1328-1336.e10, 2023 03 30.
Article in English | MEDLINE | ID: mdl-37001499

ABSTRACT

Stressed plants show altered phenotypes, including changes in color, smell, and shape. Yet, airborne sounds emitted by stressed plants have not been investigated before. Here we show that stressed plants emit airborne sounds that can be recorded from a distance and classified. We recorded ultrasonic sounds emitted by tomato and tobacco plants inside an acoustic chamber, and in a greenhouse, while monitoring the plant's physiological parameters. We developed machine learning models that succeeded in identifying the condition of the plants, including dehydration level and injury, based solely on the emitted sounds. These informative sounds may also be detectable by other organisms. This work opens avenues for understanding plants and their interactions with the environment and may have significant impact on agriculture.


Subject(s)
Plants , Sound , Stress, Physiological
5.
Cell ; 185(21): 4008-4022.e14, 2022 10 13.
Article in English | MEDLINE | ID: mdl-36150393

ABSTRACT

The continual evolution of SARS-CoV-2 and the emergence of variants that show resistance to vaccines and neutralizing antibodies threaten to prolong the COVID-19 pandemic. Selection and emergence of SARS-CoV-2 variants are driven in part by mutations within the viral spike protein and in particular the ACE2 receptor-binding domain (RBD), a primary target site for neutralizing antibodies. Here, we develop deep mutational learning (DML), a machine-learning-guided protein engineering technology, which is used to investigate a massive sequence space of combinatorial mutations, representing billions of RBD variants, by accurately predicting their impact on ACE2 binding and antibody escape. A highly diverse landscape of possible SARS-CoV-2 variants is identified that could emerge from a multitude of evolutionary trajectories. DML may be used for predictive profiling on current and prospective variants, including highly mutated variants such as Omicron, thus guiding the development of therapeutic antibody treatments and vaccines for COVID-19.


Subject(s)
Angiotensin-Converting Enzyme 2/metabolism , COVID-19 , SARS-CoV-2 , Spike Glycoprotein, Coronavirus/metabolism , Angiotensin-Converting Enzyme 2/chemistry , Angiotensin-Converting Enzyme 2/genetics , Antibodies, Neutralizing , Antibodies, Viral , COVID-19 Vaccines , Humans , Mutation , Pandemics , Protein Binding , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/chemistry , Spike Glycoprotein, Coronavirus/genetics
6.
Cell ; 183(2): 335-346.e13, 2020 10 15.
Article in English | MEDLINE | ID: mdl-33035452

ABSTRACT

Muscle spasticity after nervous system injuries and painful low back spasm affect more than 10% of global population. Current medications are of limited efficacy and cause neurological and cardiovascular side effects because they target upstream regulators of muscle contraction. Direct myosin inhibition could provide optimal muscle relaxation; however, targeting skeletal myosin is particularly challenging because of its similarity to the cardiac isoform. We identified a key residue difference between these myosin isoforms, located in the communication center of the functional regions, which allowed us to design a selective inhibitor, MPH-220. Mutagenic analysis and the atomic structure of MPH-220-bound skeletal muscle myosin confirmed the mechanism of specificity. Targeting skeletal muscle myosin by MPH-220 enabled muscle relaxation, in human and model systems, without cardiovascular side effects and improved spastic gait disorders after brain injury in a disease model. MPH-220 provides a potential nervous-system-independent option to treat spasticity and muscle stiffness.


Subject(s)
Muscle, Skeletal/metabolism , Skeletal Muscle Myosins/drug effects , Skeletal Muscle Myosins/genetics , Adult , Animals , Cardiac Myosins/genetics , Cardiac Myosins/metabolism , Cell Line , Drug Delivery Systems , Female , Humans , Male , Mice , Muscle Contraction/physiology , Muscle Fibers, Skeletal/physiology , Muscle Spasticity/genetics , Muscle Spasticity/physiopathology , Muscle, Skeletal/physiology , Myosins/drug effects , Myosins/genetics , Myosins/metabolism , Protein Isoforms , Rats , Rats, Wistar , Skeletal Muscle Myosins/metabolism
7.
Cell ; 176(3): 535-548.e24, 2019 01 24.
Article in English | MEDLINE | ID: mdl-30661751

ABSTRACT

The splicing of pre-mRNAs into mature transcripts is remarkable for its precision, but the mechanisms by which the cellular machinery achieves such specificity are incompletely understood. Here, we describe a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing. Synonymous and intronic mutations with predicted splice-altering consequence validate at a high rate on RNA-seq and are strongly deleterious in the human population. De novo mutations with predicted splice-altering consequence are significantly enriched in patients with autism and intellectual disability compared to healthy controls and validate against RNA-seq in 21 out of 28 of these patients. We estimate that 9%-11% of pathogenic mutations in patients with rare genetic disorders are caused by this previously underappreciated class of disease variation.


Subject(s)
Forecasting/methods , RNA Precursors/genetics , RNA Splicing/genetics , Algorithms , Alternative Splicing/genetics , Autistic Disorder/genetics , Deep Learning , Exons/genetics , Humans , Intellectual Disability/genetics , Introns/genetics , Neural Networks, Computer , RNA Precursors/metabolism , RNA Splice Sites/genetics , RNA Splice Sites/physiology
8.
Cell ; 172(5): 1122-1131.e9, 2018 02 22.
Article in English | MEDLINE | ID: mdl-29474911

ABSTRACT

The implementation of clinical-decision support algorithms for medical imaging faces challenges with reliability and interpretability. Here, we establish a diagnostic tool based on a deep-learning framework for the screening of patients with common treatable blinding retinal diseases. Our framework utilizes transfer learning, which trains a neural network with a fraction of the data of conventional approaches. Applying this approach to a dataset of optical coherence tomography images, we demonstrate performance comparable to that of human experts in classifying age-related macular degeneration and diabetic macular edema. We also provide a more transparent and interpretable diagnosis by highlighting the regions recognized by the neural network. We further demonstrate the general applicability of our AI system for diagnosis of pediatric pneumonia using chest X-ray images. This tool may ultimately aid in expediting the diagnosis and referral of these treatable conditions, thereby facilitating earlier treatment, resulting in improved clinical outcomes. VIDEO ABSTRACT.


Subject(s)
Deep Learning , Diagnostic Imaging , Pneumonia/diagnosis , Child , Humans , Neural Networks, Computer , Pneumonia/diagnostic imaging , ROC Curve , Reproducibility of Results , Tomography, Optical Coherence
9.
Physiol Rev ; 103(4): 2423-2450, 2023 10 01.
Article in English | MEDLINE | ID: mdl-37104717

ABSTRACT

Artificial intelligence in health care has experienced remarkable innovation and progress in the last decade. Significant advancements can be attributed to the utilization of artificial intelligence to transform physiology data to advance health care. In this review, we explore how past work has shaped the field and defined future challenges and directions. In particular, we focus on three areas of development. First, we give an overview of artificial intelligence, with special attention to the most relevant artificial intelligence models. We then detail how physiology data have been harnessed by artificial intelligence to advance the main areas of health care: automating existing health care tasks, increasing access to care, and augmenting health care capabilities. Finally, we discuss emerging concerns surrounding the use of individual physiology data and detail an increasingly important consideration for the field, namely the challenges of deploying artificial intelligence models to achieve meaningful clinical impact.


Subject(s)
Artificial Intelligence , Delivery of Health Care , Humans
10.
Trends Biochem Sci ; 48(12): 1014-1018, 2023 12.
Article in English | MEDLINE | ID: mdl-37833131

ABSTRACT

Generative artificial intelligence (AI) is a burgeoning field with widespread applications, including in science. Here, we explore two paradigms that provide insight into the capabilities and limitations of Chat Generative Pre-trained Transformer (ChatGPT): its ability to (i) define a core biological concept (the Central Dogma of molecular biology); and (ii) interpret the genetic code.


Subject(s)
Artificial Intelligence , Genetic Code , Molecular Biology
11.
Annu Rev Pharmacol Toxicol ; 64: 159-170, 2024 Jan 23.
Article in English | MEDLINE | ID: mdl-37562495

ABSTRACT

Health digital twins (HDTs) are virtual representations of real individuals that can be used to simulate human physiology, disease, and drug effects. HDTs can be used to improve drug discovery and development by providing a data-driven approach to inform target selection, drug delivery, and design of clinical trials. HDTs also offer new applications into precision therapies and clinical decision making. The deployment of HDTs at scale could bring a precision approach to public health monitoring and intervention. Next steps include challenges such as addressing socioeconomic barriers and ensuring the representativeness of the technology based on the training and validation data sets. Governance and regulation of HDT technology are still in the early stages.


Subject(s)
Biological Science Disciplines , Humans , Drug Delivery Systems , Drug Discovery , Technology , Delivery of Health Care
12.
Trends Genet ; 40(5): 383-386, 2024 May.
Article in English | MEDLINE | ID: mdl-38637270

ABSTRACT

Artificial intelligence (AI) in omics analysis raises privacy threats to patients. Here, we briefly discuss risk factors to patient privacy in data sharing, model training, and release, as well as methods to safeguard and evaluate patient privacy in AI-driven omics methods.


Subject(s)
Artificial Intelligence , Genomics , Humans , Genomics/methods , Privacy , Information Dissemination
13.
Trends Genet ; 40(10): 891-908, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39117482

ABSTRACT

Harnessing cutting-edge technologies to enhance crop productivity is a pivotal goal in modern plant breeding. Artificial intelligence (AI) is renowned for its prowess in big data analysis and pattern recognition, and is revolutionizing numerous scientific domains including plant breeding. We explore the wider potential of AI tools in various facets of breeding, including data collection, unlocking genetic diversity within genebanks, and bridging the genotype-phenotype gap to facilitate crop breeding. This will enable the development of crop cultivars tailored to the projected future environments. Moreover, AI tools also hold promise for refining crop traits by improving the precision of gene-editing systems and predicting the potential effects of gene variants on plant phenotypes. Leveraging AI-enabled precision breeding can augment the efficiency of breeding programs and holds promise for optimizing cropping systems at the grassroots level. This entails identifying optimal inter-cropping and crop-rotation models to enhance agricultural sustainability and productivity in the field.


Subject(s)
Artificial Intelligence , Crops, Agricultural , Plant Breeding , Plant Breeding/methods , Crops, Agricultural/genetics , Crops, Agricultural/growth & development , Phenotype , Genetic Variation , Gene Editing/methods , Genotype
14.
Annu Rev Genomics Hum Genet ; 25(1): 141-159, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38724019

ABSTRACT

Significant progress has been made in augmenting clinical decision-making using artificial intelligence (AI) in the context of secondary and tertiary care at large academic medical centers. For such innovations to have an impact across the spectrum of care, additional challenges must be addressed, including inconsistent use of preventative care and gaps in chronic care management. The integration of additional data, including genomics and data from wearables, could prove critical in addressing these gaps, but technical, legal, and ethical challenges arise. On the technical side, approaches for integrating complex and messy data are needed. Data and design imperfections like selection bias, missing data, and confounding must be addressed. In terms of legal and ethical challenges, while AI has the potential to aid in leveraging patient data to make clinical care decisions, we also risk exacerbating existing disparities. Organizations implementing AI solutions must carefully consider how they can improve care for all and reduce inequities.


Subject(s)
Artificial Intelligence , Precision Medicine , Humans , Clinical Decision-Making , Genomics/methods
15.
Am J Hum Genet ; 111(9): 1819-1833, 2024 Sep 05.
Article in English | MEDLINE | ID: mdl-39146935

ABSTRACT

Large language models (LLMs) are generating interest in medical settings. For example, LLMs can respond coherently to medical queries by providing plausible differential diagnoses based on clinical notes. However, there are many questions to explore, such as evaluating differences between open- and closed-source LLMs as well as LLM performance on queries from both medical and non-medical users. In this study, we assessed multiple LLMs, including Llama-2-chat, Vicuna, Medllama2, Bard/Gemini, Claude, ChatGPT3.5, and ChatGPT-4, as well as non-LLM approaches (Google search and Phenomizer) regarding their ability to identify genetic conditions from textbook-like clinician questions and their corresponding layperson translations related to 63 genetic conditions. For open-source LLMs, larger models were more accurate than smaller LLMs: 7b, 13b, and larger than 33b parameter models obtained accuracy ranges from 21%-49%, 41%-51%, and 54%-68%, respectively. Closed-source LLMs outperformed open-source LLMs, with ChatGPT-4 performing best (89%-90%). Three of 11 LLMs and Google search had significant performance gaps between clinician and layperson prompts. We also evaluated how in-context prompting and keyword removal affected open-source LLM performance. Models were provided with 2 types of in-context prompts: list-type prompts, which improved LLM performance, and definition-type prompts, which did not. We further analyzed removal of rare terms from descriptions, which decreased accuracy for 5 of 7 evaluated LLMs. Finally, we observed much lower performance with real individuals' descriptions; LLMs answered these questions with a maximum 21% accuracy.


Subject(s)
Self Report , Humans , Language , Genetic Diseases, Inborn/genetics
16.
Am J Hum Genet ; 111(10): 2190-2202, 2024 Oct 03.
Article in English | MEDLINE | ID: mdl-39255797

ABSTRACT

Phenotype-driven gene prioritization is fundamental to diagnosing rare genetic disorders. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models (LLMs) promise a streamlined text-to-gene solution. In this study, we evaluated five LLMs, including two generative pre-trained transformers (GPT) series and three Llama2 series, assessing their performance across task completeness, gene prediction accuracy, and adherence to required output structures. We conducted experiments, exploring various combinations of models, prompts, phenotypic input types, and task difficulty levels. Our findings revealed that the best-performed LLM, GPT-4, achieved an average accuracy of 17.0% in identifying diagnosed genes within the top 50 predictions, which still falls behind traditional tools. However, accuracy increased with the model size. Consistent results were observed over time, as shown in the dataset curated after 2023. Advanced techniques such as retrieval-augmented generation (RAG) and few-shot learning did not improve the accuracy. Sophisticated prompts were more likely to enhance task completeness, especially in smaller models. Conversely, complicated prompts tended to decrease output structure compliance rate. LLMs also achieved better-than-random prediction accuracy with free-text input, though performance was slightly lower than with standardized concept input. Bias analysis showed that highly cited genes, such as BRCA1, TP53, and PTEN, are more likely to be predicted. Our study provides valuable insights into integrating LLMs with genomic analysis, contributing to the ongoing discussion on their utilization in clinical workflows.


Subject(s)
Phenotype , Rare Diseases , Humans , Rare Diseases/genetics , Computational Biology/methods
17.
Proc Natl Acad Sci U S A ; 121(16): e2303165121, 2024 Apr 16.
Article in English | MEDLINE | ID: mdl-38607932

ABSTRACT

Antimicrobial resistance was estimated to be associated with 4.95 million deaths worldwide in 2019. It is possible to frame the antimicrobial resistance problem as a feedback-control problem. If we could optimize this feedback-control problem and translate our findings to the clinic, we could slow, prevent, or reverse the development of high-level drug resistance. Prior work on this topic has relied on systems where the exact dynamics and parameters were known a priori. In this study, we extend this work using a reinforcement learning (RL) approach capable of learning effective drug cycling policies in a system defined by empirically measured fitness landscapes. Crucially, we show that it is possible to learn effective drug cycling policies despite the problems of noisy, limited, or delayed measurement. Given access to a panel of 15 [Formula: see text]-lactam antibiotics with which to treat the simulated Escherichia coli population, we demonstrate that RL agents outperform two naive treatment paradigms at minimizing the population fitness over time. We also show that RL agents approach the performance of the optimal drug cycling policy. Even when stochastic noise is introduced to the measurements of population fitness, we show that RL agents are capable of maintaining evolving populations at lower growth rates compared to controls. We further tested our approach in arbitrary fitness landscapes of up to 1,024 genotypes. We show that minimization of population fitness using drug cycles is not limited by increasing genome size. Our work represents a proof-of-concept for using AI to control complex evolutionary processes.


Subject(s)
Anti-Infective Agents , Learning , Reinforcement, Psychology , Drug Resistance, Microbial , Bicycling , Escherichia coli/genetics
18.
Proc Natl Acad Sci U S A ; 121(41): e2322420121, 2024 Oct 08.
Article in English | MEDLINE | ID: mdl-39365822

ABSTRACT

The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that to develop a holistic understanding of these systems, we must consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts, we can make predictions about the strategies that LLMs will adopt, allowing us to reason about when they will succeed or fail. Using this approach-which we call the teleological approach-we identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input. To test our predictions, we evaluate five LLMs (GPT-3.5, GPT-4, Claude 3, Llama 3, and Gemini 1.0) on 11 tasks, and we find robust evidence that LLMs are influenced by probability in the hypothesized ways. Many of the experiments reveal surprising failure modes. For instance, GPT-4's accuracy at decoding a simple cipher is 51% when the output is a high-probability sentence but only 13% when it is low-probability, even though this task is a deterministic one for which probability should not matter. These results show that AI practitioners should be careful about using LLMs in low-probability situations. More broadly, we conclude that we should not evaluate LLMs as if they are humans but should instead treat them as a distinct type of system-one that has been shaped by its own particular set of pressures.


Subject(s)
Language , Humans , Models, Theoretical
19.
Proc Natl Acad Sci U S A ; 121(18): e2307304121, 2024 Apr 30.
Article in English | MEDLINE | ID: mdl-38640257

ABSTRACT

Over the past few years, machine learning models have significantly increased in size and complexity, especially in the area of generative AI such as large language models. These models require massive amounts of data and compute capacity to train, to the extent that concerns over the training data (such as protected or private content) cannot be practically addressed by retraining the model "from scratch" with the questionable data removed or altered. Furthermore, despite significant efforts and controls dedicated to ensuring that training corpora are properly curated and composed, the sheer volume required makes it infeasible to manually inspect each datum comprising a training corpus. One potential approach to training corpus data defects is model disgorgement, by which we broadly mean the elimination or reduction of not only any improperly used data, but also the effects of improperly used data on any component of an ML model. Model disgorgement techniques can be used to address a wide range of issues, such as reducing bias or toxicity, increasing fidelity, and ensuring responsible use of intellectual property. In this paper, we survey the landscape of model disgorgement methods and introduce a taxonomy of disgorgement techniques that are applicable to modern ML systems. In particular, we investigate the various meanings of "removing the effects" of data on the trained model in a way that does not require retraining from scratch.


Subject(s)
Language , Machine Learning
20.
Hum Mol Genet ; 33(15): 1367-1377, 2024 Jul 22.
Article in English | MEDLINE | ID: mdl-38704739

ABSTRACT

Spinal Muscular Atrophy is caused by partial loss of survival of motoneuron (SMN) protein expression. The numerous interaction partners and mechanisms influenced by SMN loss result in a complex disease. Current treatments restore SMN protein levels to a certain extent, but do not cure all symptoms. The prolonged survival of patients creates an increasing need for a better understanding of SMA. Although many SMN-protein interactions, dysregulated pathways, and organ phenotypes are known, the connections among them remain largely unexplored. Monogenic diseases are ideal examples for the exploration of cause-and-effect relationships to create a network describing the disease-context. Machine learning tools can utilize such knowledge to analyze similarities between disease-relevant molecules and molecules not described in the disease so far. We used an artificial intelligence-based algorithm to predict new genes of interest. The transcriptional regulation of 8 out of 13 molecules selected from the predicted set were successfully validated in an SMA mouse model. This bioinformatic approach, using the given experimental knowledge for relevance predictions, enhances efficient targeted research in SMA and potentially in other disease settings.


Subject(s)
Artificial Intelligence , Computational Biology , Disease Models, Animal , Muscular Atrophy, Spinal , Muscular Atrophy, Spinal/genetics , Muscular Atrophy, Spinal/metabolism , Animals , Mice , Humans , Computational Biology/methods , Survival of Motor Neuron 1 Protein/genetics , Survival of Motor Neuron 1 Protein/metabolism , Machine Learning , Algorithms , Gene Expression Regulation/genetics
SELECTION OF CITATIONS
SEARCH DETAIL