Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Cell ; 173(7): 1581-1592, 2018 06 14.
Article in English | MEDLINE | ID: mdl-29887378

ABSTRACT

Machine learning, a collection of data-analytical techniques aimed at building predictive models from multi-dimensional datasets, is becoming integral to modern biological research. By enabling one to generate models that learn from large datasets and make predictions on likely outcomes, machine learning can be used to study complex cellular systems such as biological networks. Here, we provide a primer on machine learning for life scientists, including an introduction to deep learning. We discuss opportunities and challenges at the intersection of machine learning and network biology, which could impact disease biology, drug discovery, microbiome research, and synthetic biology.


Subject(s)
Computational Biology/methods , Machine Learning , Algorithms , Databases, Factual , Drug Discovery , Drug-Related Side Effects and Adverse Reactions , Humans , Microbiota , Neural Networks, Computer
2.
Proc Natl Acad Sci U S A ; 121(24): e2318124121, 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38830100

ABSTRACT

There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs; this is insufficient for making an informed decision about which LLMs are best to use in an interactive setting, and how that varies by setting. Static assessment therefore limits how we understand language model capabilities. We introduce CheckMate, an adaptable prototype platform for humans to interact with and evaluate LLMs. We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics, with a mixed cohort of participants from undergraduate students to professors of mathematics. We release the resulting interaction and rating dataset, MathConverse. By analyzing MathConverse, we derive a taxonomy of human query behaviors and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness in LLM generations, among other findings. Further, we garner a more granular understanding of GPT-4 mathematical problem-solving through a series of case studies, contributed by experienced mathematicians. We conclude with actionable takeaways for ML practitioners and mathematicians: models that communicate uncertainty, respond well to user corrections, and can provide a concise rationale for their recommendations, may constitute better assistants. Humans should inspect LLM output carefully given their current shortcomings and potential for surprising fallibility.


Subject(s)
Language , Mathematics , Problem Solving , Humans , Problem Solving/physiology , Students/psychology
3.
Proc Natl Acad Sci U S A ; 118(27)2021 07 06.
Article in English | MEDLINE | ID: mdl-34187888

ABSTRACT

Recent progress in DNA synthesis and sequencing technology has enabled systematic studies of protein function at a massive scale. We explore a deep mutational scanning study that measured the transcriptional repression function of 43,669 variants of the Escherichia coli LacI protein. We analyze structural and evolutionary aspects that relate to how the function of this protein is maintained, including an in-depth look at the C-terminal domain. We develop a deep neural network to predict transcriptional repression mediated by the lac repressor of Escherichia coli using experimental measurements of variant function. When measured across 10 separate training and validation splits using 5,009 single mutations of the lac repressor, our best-performing model achieved a median Pearson correlation of 0.79, exceeding any previous model. We demonstrate that deep representation learning approaches, first trained in an unsupervised manner across millions of diverse proteins, can be fine-tuned in a supervised fashion using lac repressor experimental datasets to more effectively predict a variant's effect on repression. These findings suggest a deep representation learning model may improve the prediction of other important properties of proteins.


Subject(s)
Deep Learning , Escherichia coli Proteins/metabolism , Lac Repressors/metabolism , Transcription, Genetic , Epistasis, Genetic , Escherichia coli Proteins/genetics , Lac Repressors/genetics , Mutation/genetics , Protein Domains , Reproducibility of Results
4.
Fam Community Health ; 45(2): 59-66, 2022.
Article in English | MEDLINE | ID: mdl-35125488

ABSTRACT

Mixed-status families-whose members have multiple immigration statuses-are common in US immigrant communities. Large-scale worksite raids, an immigration enforcement tactic used throughout US history, returned during the Trump administration. Yet, little research characterizes the impacts of these raids, especially as related to mixed-status families. The current study (1) describes a working definition of a large-scale worksite raid and (2) considers impacts of these raids on mixed-status families. We conducted semistructured interviews in Spanish and English at 6 communities that experienced the largest worksite raids in 2018. Participants were 77 adults who provided material, emotional, or professional support following raids. Qualitative analysis methods were used to develop a codebook and code all interviews. The unpredictability of worksite raids resulted in chaos and confusion, often stemming from potential family separation. Financial crises followed because of the removal of primary financial providers. In response, families rearranged roles to generate income. Large-scale worksite raids result in similar harms to mixed-status families as other enforcement tactics but on a much larger scale. They also uniquely drain community resources, with long-term impacts. Advocacy and policy efforts are needed to mitigate damage and end this practice.


Subject(s)
Emigrants and Immigrants , Emigration and Immigration , Adult , Family Relations , Hispanic or Latino , Humans , Workplace
5.
RNA Biol ; 18(sup2): 770-781, 2021 11 12.
Article in English | MEDLINE | ID: mdl-34719327

ABSTRACT

TUT4 and the closely related TUT7 are non-templated poly(U) polymerases required at different stages of development, and their mis-regulation or mutation has been linked to important cancer pathologies. While TUT4(7) interaction with its pre-miRNA targets has been characterized in detail, the molecular bases of the broader target recognition process are unclear. Here, we examine RNA binding by the ZnF domains of the protein. We show that TUT4(7) ZnF2 contains two distinct RNA binding surfaces that are used in the interaction with different RNA nucleobases in different targets, i.e that this small domain encodes diversity in TUT4(7) selectivity and molecular function. Interestingly and unlike other well-characterized CCHC ZnFs, ZnF2 is not physically coupled to the flanking ZnF3 and acts independently in miRNA recognition, while the remaining CCHC ZnF of TUT4(7), ZnF1, has lost its intrinsic RNA binding capability. Together, our data suggest that the ZnFs of TUT4(7) are independent units for RNA and, possibly, protein-protein interactions that underlay the protein's functional flexibility and are likely to play an important role in building its interaction network.


Subject(s)
DNA-Binding Proteins/metabolism , Epistasis, Genetic , Gene Expression Regulation , MicroRNAs/genetics , RNA-Binding Proteins/metabolism , Zinc Fingers , Base Composition , DNA-Binding Proteins/chemistry , Humans , Magnetic Resonance Spectroscopy , MicroRNAs/chemistry , MicroRNAs/metabolism , Poly U , Protein Interaction Domains and Motifs , RNA-Binding Proteins/chemistry , Structure-Activity Relationship
6.
Nucleic Acids Res ; 45(11): 6761-6774, 2017 Jun 20.
Article in English | MEDLINE | ID: mdl-28379442

ABSTRACT

RBM10 is an RNA-binding protein that plays an essential role in development and is frequently mutated in the context of human disease. RBM10 recognizes a diverse set of RNA motifs in introns and exons and regulates alternative splicing. However, the molecular mechanisms underlying this seemingly relaxed sequence specificity are not understood and functional studies have focused on 3΄ intronic sites only. Here, we dissect the RNA code recognized by RBM10 and relate it to the splicing regulatory function of this protein. We show that a two-domain RRM1-ZnF unit recognizes a GGA-centered motif enriched in RBM10 exonic sites with high affinity and specificity and test that the interaction with these exonic sequences promotes exon skipping. Importantly, a second RRM domain (RRM2) of RBM10 recognizes a C-rich sequence, which explains its known interaction with the intronic 3΄ site of NUMB exon 9 contributing to regulation of the Notch pathway in cancer. Together, these findings explain RBM10's broad RNA specificity and suggest that RBM10 functions as a splicing regulator using two RNA-binding units with different specificities to promote exon skipping.


Subject(s)
RNA-Binding Proteins/physiology , Autoantigens , Base Sequence , Binding Sites , Exons , HEK293 Cells , Humans , Protein Binding , RNA Splicing , RNA, Messenger/chemistry , RNA, Messenger/metabolism , RNA-Binding Proteins/chemistry , Zinc Fingers
7.
Nucleic Acids Res ; 43(6): e41, 2015 Mar 31.
Article in English | MEDLINE | ID: mdl-25586222

ABSTRACT

Defining the RNA target selectivity of the proteins regulating mRNA metabolism is a key issue in RNA biology. Here we present a novel use of principal component analysis (PCA) to extract the RNA sequence preference of RNA binding proteins. We show that PCA can be used to compare the changes in the nuclear magnetic resonance (NMR) spectrum of a protein upon binding a set of quasi-degenerate RNAs and define the nucleobase specificity. We couple this application of PCA to an automated NMR spectra recording and processing protocol and obtain an unbiased and high-throughput NMR method for the analysis of nucleobase preference in protein-RNA interactions. We test the method on the RNA binding domains of three important regulators of RNA metabolism.


Subject(s)
High-Throughput Screening Assays/methods , Nuclear Magnetic Resonance, Biomolecular/methods , RNA-Binding Proteins/metabolism , RNA/genetics , RNA/metabolism , Base Sequence , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/metabolism , High-Throughput Screening Assays/statistics & numerical data , Humans , Models, Molecular , Principal Component Analysis , Protein Interaction Domains and Motifs , RNA-Binding Proteins/chemistry , Recombinant Proteins/chemistry , Recombinant Proteins/metabolism , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/metabolism , mRNA Cleavage and Polyadenylation Factors/chemistry , mRNA Cleavage and Polyadenylation Factors/metabolism
8.
Cell Rep Methods ; 3(6): 100508, 2023 06 26.
Article in English | MEDLINE | ID: mdl-37426752

ABSTRACT

Understanding how the RNA-binding domains of a protein regulator are used to recognize its RNA targets is a key problem in RNA biology, but RNA-binding domains with very low affinity do not perform well in the methods currently available to characterize protein-RNA interactions. Here, we propose to use conservative mutations that enhance the affinity of RNA-binding domains to overcome this limitation. As a proof of principle, we have designed and validated an affinity-enhanced K-homology (KH) domain mutant of the fragile X syndrome protein FMRP, a key regulator of neuronal development, and used this mutant to determine the domain's sequence preference and to explain FMRP recognition of specific RNA motifs in the cell. Our results validate our concept and our nuclear magnetic resonance (NMR)-based workflow. While effective mutant design requires an understanding of the underlying principles of RNA recognition by the relevant domain type, we expect the method will be used effectively in many RNA-binding domains.


Subject(s)
Fragile X Mental Retardation Protein , RNA , RNA/genetics , Fragile X Mental Retardation Protein/genetics , Proteins/genetics , Mutation , RNA-Binding Motifs/genetics
9.
Microorganisms ; 11(4)2023 Apr 20.
Article in English | MEDLINE | ID: mdl-37110501

ABSTRACT

Bacteria use an array of sigma factors to regulate gene expression during different stages of their life cycles. Full-length, atomic-level structures of sigma factors have been challenging to obtain experimentally as a result of their many regions of intrinsic disorder. AlphaFold has now supplied plausible full-length models for most sigma factors. Here we discuss the current understanding of the structures and functions of sigma factors in the model organism, Bacillus subtilis, and present an X-ray crystal structure of a region of B. subtilis SigE, a sigma factor that plays a critical role in the developmental process of spore formation.

10.
Healthcare (Basel) ; 11(14)2023 Jul 13.
Article in English | MEDLINE | ID: mdl-37510458

ABSTRACT

BACKGROUND: Alzheimer's disease's (AD) prevalence is projected to increase as the population ages and current treatments are minimally effective. Transcranial photobiomodulation (t-PBM) with near-infrared (NIR) light penetrates into the cerebral cortex, stimulates the mitochondrial respiratory chain, and increases cerebral blood flow. Preliminary data suggests t-PBM may be efficacious in improving cognition in people with early AD and amnestic mild cognitive impairment (aMCI). METHODS: In this randomized, double-blind, placebo-controlled study with aMCI and early AD participants, we will test the efficacy, safety, and impact on cognition of 24 sessions of t-PBM delivered over 8 weeks. Brain mechanisms of t-PBM in this population will be explored by testing whether the baseline tau burden (measured with 18F-MK6240), or changes in mitochondrial function over 8 weeks (assessed with 31P-MRSI), moderates the changes observed in cognitive functions after t-PBM therapy. We will also use changes in the fMRI Blood-Oxygenation-Level-Dependent (BOLD) signal after a single treatment to demonstrate t-PBM-dependent increases in prefrontal cortex blood flow. CONCLUSION: This study will test whether t-PBM, a low-cost, accessible, and user-friendly intervention, has the potential to improve cognition and function in an aMCI and early AD population.

11.
Cell Syst ; 14(6): 525-542.e9, 2023 06 21.
Article in English | MEDLINE | ID: mdl-37348466

ABSTRACT

The design choices underlying machine-learning (ML) models present important barriers to entry for many biologists who aim to incorporate ML in their research. Automated machine-learning (AutoML) algorithms can address many challenges that come with applying ML to the life sciences. However, these algorithms are rarely used in systems and synthetic biology studies because they typically do not explicitly handle biological sequences (e.g., nucleotide, amino acid, or glycan sequences) and cannot be easily compared with other AutoML algorithms. Here, we present BioAutoMATED, an AutoML platform for biological sequence analysis that integrates multiple AutoML methods into a unified framework. Users are automatically provided with relevant techniques for analyzing, interpreting, and designing biological sequences. BioAutoMATED predicts gene regulation, peptide-drug interactions, and glycan annotation, and designs optimized synthetic biology components, revealing salient sequence characteristics. By automating sequence modeling, BioAutoMATED allows life scientists to incorporate ML more readily into their work.


Subject(s)
Algorithms , Machine Learning
12.
Nat Commun ; 11(1): 5058, 2020 10 07.
Article in English | MEDLINE | ID: mdl-33028819

ABSTRACT

While synthetic biology has revolutionized our approaches to medicine, agriculture, and energy, the design of completely novel biological circuit components beyond naturally-derived templates remains challenging due to poorly understood design rules. Toehold switches, which are programmable nucleic acid sensors, face an analogous design bottleneck; our limited understanding of how sequence impacts functionality often necessitates expensive, time-consuming screens to identify effective switches. Here, we introduce Sequence-based Toehold Optimization and Redesign Model (STORM) and Nucleic-Acid Speech (NuSpeak), two orthogonal and synergistic deep learning architectures to characterize and optimize toeholds. Applying techniques from computer vision and natural language processing, we 'un-box' our models using convolutional filters, attention maps, and in silico mutagenesis. Through transfer-learning, we redesign sub-optimal toehold sensors, even with sparse training data, experimentally validating their improved performance. This work provides sequence-to-function deep learning frameworks for toehold selection and design, augmenting our ability to construct potent biological circuit components and precision diagnostics.


Subject(s)
Biotechnology/methods , Deep Learning , Genetic Engineering/methods , Riboswitch/genetics , Synthetic Biology/methods , Base Sequence/genetics , Computer Simulation , Datasets as Topic , Genome, Human/genetics , Genome, Viral/genetics , Humans , Models, Genetic , Mutagenesis , Natural Language Processing , Structure-Activity Relationship
13.
Structure ; 26(4): 640-648.e5, 2018 04 03.
Article in English | MEDLINE | ID: mdl-29526435

ABSTRACT

Global changes in bacterial gene expression can be orchestrated by the coordinated activation/deactivation of alternative sigma (σ) factor subunits of RNA polymerase. Sigma factors themselves are regulated in myriad ways, including via anti-sigma factors. Here, we have determined the solution structure of anti-sigma factor CsfB, responsible for inhibition of two alternative sigma factors, σG and σE, during spore formation by Bacillus subtilis. CsfB assembles into a symmetrical homodimer, with each monomer bound to a single Zn2+ ion via a treble-clef zinc finger fold. Directed mutagenesis indicates that dimer formation is critical for CsfB-mediated inhibition of both σG and σE, and we have characterized these interactions in vitro. This work represents an advance in our understanding of how CsfB mediates inhibition of two alternative sigma factors to drive developmental gene expression in a bacterium.


Subject(s)
Bacillus subtilis/chemistry , Gene Expression Regulation, Bacterial , Repressor Proteins/chemistry , Sigma Factor/chemistry , Spores, Bacterial/chemistry , Zinc/chemistry , Amino Acid Sequence , Bacillus subtilis/genetics , Bacillus subtilis/metabolism , Binding Sites , Cations, Divalent , Cloning, Molecular , Crystallography, X-Ray , Escherichia coli/genetics , Escherichia coli/metabolism , Genetic Vectors/chemistry , Genetic Vectors/metabolism , Models, Molecular , Mutation , Protein Binding , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Protein Interaction Domains and Motifs , Protein Isoforms/antagonists & inhibitors , Protein Isoforms/chemistry , Protein Isoforms/genetics , Protein Isoforms/metabolism , Protein Multimerization , Recombinant Proteins/chemistry , Recombinant Proteins/genetics , Recombinant Proteins/metabolism , Repressor Proteins/genetics , Repressor Proteins/metabolism , Sequence Alignment , Sequence Homology, Amino Acid , Sigma Factor/antagonists & inhibitors , Sigma Factor/genetics , Sigma Factor/metabolism , Spores, Bacterial/genetics , Spores, Bacterial/metabolism , Zinc/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL