Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28.804
Filtrar
Más filtros

Intervalo de año de publicación
1.
Cell ; 174(6): 1424-1435.e15, 2018 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-30078708

RESUMEN

FOXP2, initially identified for its role in human speech, contains two nonsynonymous substitutions derived in the human lineage. Evidence for a recent selective sweep in Homo sapiens, however, is at odds with the presence of these substitutions in archaic hominins. Here, we comprehensively reanalyze FOXP2 in hundreds of globally distributed genomes to test for recent selection. We do not find evidence of recent positive or balancing selection at FOXP2. Instead, the original signal appears to have been due to sample composition. Our tests do identify an intronic region that is enriched for highly conserved sites that are polymorphic among humans, compatible with a loss of function in humans. This region is lowly expressed in relevant tissue types that were tested via RNA-seq in human prefrontal cortex and RT-PCR in immortalized human brain cells. Our results represent a substantial revision to the adaptive history of FOXP2, a gene regarded as vital to human evolution.


Asunto(s)
Factores de Transcripción Forkhead/genética , Encéfalo/citología , Encéfalo/metabolismo , Línea Celular , Bases de Datos Genéticas , Exones , Femenino , Genoma Humano , Haplotipos , Humanos , Intrones , Masculino , Cadenas de Markov , Polimorfismo de Nucleótido Simple , Corteza Prefrontal/metabolismo
2.
Mol Cell ; 84(7): 1257-1270.e6, 2024 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-38377993

RESUMEN

Current base editors (BEs) use DNA deaminases, including cytidine deaminase in cytidine BE (CBE) or adenine deaminase in adenine BE (ABE), to facilitate transition nucleotide substitutions. Combining CBE or ABE with glycosylase enzymes can induce limited transversion mutations. Nonetheless, a critical demand remains for BEs capable of generating alternative mutation types, such as T>G corrections. In this study, we leveraged pre-trained protein language models to optimize a uracil-N-glycosylase (UNG) variant with altered specificity for thymines (eTDG). Notably, after two rounds of testing fewer than 50 top-ranking variants, more than 50% exhibited over 1.5-fold enhancement in enzymatic activities. When eTDG was fused with nCas9, it induced programmable T-to-S (G/C) substitutions and corrected db/db diabetic mutation in mice (up to 55%). Our findings not only establish orthogonal strategies for developing novel BEs but also demonstrate the capacities of protein language models for optimizing enzymes without extensive task-specific training data.


Asunto(s)
Ácidos Alcanesulfónicos , Edición Génica , Uracil-ADN Glicosidasa , Animales , Ratones , Mutación , Uracil-ADN Glicosidasa/genética , Uracil-ADN Glicosidasa/metabolismo
3.
Cell ; 164(6): 1269-1276, 2016 Mar 10.
Artículo en Inglés | MEDLINE | ID: mdl-26967292

RESUMEN

The use of vocalizations to communicate information and elaborate social bonds is an adaptation seen in many vertebrate species. Human speech is an extreme version of this pervasive form of communication. Unlike the vocalizations exhibited by the majority of land vertebrates, speech is a learned behavior requiring early sensory exposure and auditory feedback for its development and maintenance. Studies in humans and a small number of other species have provided insights into the neural and genetic basis for learned vocal communication and are helping to delineate the roles of brain circuits across the cortex, basal ganglia, and cerebellum in generating vocal behaviors. This Review provides an outline of the current knowledge about these circuits and the genes implicated in vocal communication, as well as a perspective on future research directions in this field.


Asunto(s)
Habla , Vocalización Animal , Animales , Encéfalo/fisiología , Factores de Transcripción Forkhead/genética , Factores de Transcripción Forkhead/metabolismo , Humanos , Aprendizaje , Enfermedades del Sistema Nervioso/genética , Vías Nerviosas
4.
Annu Rev Neurosci ; 45: 295-316, 2022 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-35316612

RESUMEN

Vocal communication is a critical feature of social interaction across species; however, the relation between such behavior in humans and nonhumans remains unclear. To enable comparative investigation of this topic, we review the literature pertinent to interactive language use and identify the superset of cognitive operations involved in generating communicative action. We posit these functions comprise three intersecting multistep pathways: (a) the Content Pathway, which selects the movements constituting a response; (b) the Timing Pathway, which temporally structures responses; and (c) the Affect Pathway, which modulates response parameters according to internal state. These processing streams form the basis of the Convergent Pathways for Interaction framework, which provides a conceptual model for investigating the cognitive and neural computations underlying vocal communication across species.


Asunto(s)
Lenguaje , Vocalización Animal , Animales , Humanos , Vocalización Animal/fisiología
5.
Trends Biochem Sci ; 48(12): 1014-1018, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37833131

RESUMEN

Generative artificial intelligence (AI) is a burgeoning field with widespread applications, including in science. Here, we explore two paradigms that provide insight into the capabilities and limitations of Chat Generative Pre-trained Transformer (ChatGPT): its ability to (i) define a core biological concept (the Central Dogma of molecular biology); and (ii) interpret the genetic code.


Asunto(s)
Inteligencia Artificial , Código Genético , Biología Molecular
6.
Physiol Rev ; 100(3): 1019-1063, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32233912

RESUMEN

Comparative studies on brain asymmetry date back to the 19th century but then largely disappeared due to the assumption that lateralization is uniquely human. Since the reemergence of this field in the 1970s, we learned that left-right differences of brain and behavior exist throughout the animal kingdom and pay off in terms of sensory, cognitive, and motor efficiency. Ontogenetically, lateralization starts in many species with asymmetrical expression patterns of genes within the Nodal cascade that set up the scene for later complex interactions of genetic, environmental, and epigenetic factors. These take effect during different time points of ontogeny and create asymmetries of neural networks in diverse species. As a result, depending on task demands, left- or right-hemispheric loops of feedforward or feedback projections are then activated and can temporarily dominate a neural process. In addition, asymmetries of commissural transfer can shape lateralized processes in each hemisphere. It is still unclear if interhemispheric interactions depend on an inhibition/excitation dichotomy or instead adjust the contralateral temporal neural structure to delay the other hemisphere or synchronize with it during joint action. As outlined in our review, novel animal models and approaches could be established in the last decades, and they already produced a substantial increase of knowledge. Since there is practically no realm of human perception, cognition, emotion, or action that is not affected by our lateralized neural organization, insights from these comparative studies are crucial to understand the functions and pathologies of our asymmetric brain.


Asunto(s)
Evolución Biológica , Encéfalo/fisiología , Lateralidad Funcional/genética , Lateralidad Funcional/fisiología , Animales , Encéfalo/anatomía & histología , Historia del Siglo XIX , Historia del Siglo XX , Historia del Siglo XXI , Humanos , Investigación/historia
7.
Proc Natl Acad Sci U S A ; 121(10): e2307876121, 2024 Mar 05.
Artículo en Inglés | MEDLINE | ID: mdl-38422017

RESUMEN

During real-time language comprehension, our minds rapidly decode complex meanings from sequences of words. The difficulty of doing so is known to be related to words' contextual predictability, but what cognitive processes do these predictability effects reflect? In one view, predictability effects reflect facilitation due to anticipatory processing of words that are predictable from context. This view predicts a linear effect of predictability on processing demand. In another view, predictability effects reflect the costs of probabilistic inference over sentence interpretations. This view predicts either a logarithmic or a superlogarithmic effect of predictability on processing demand, depending on whether it assumes pressures toward a uniform distribution of information over time. The empirical record is currently mixed. Here, we revisit this question at scale: We analyze six reading datasets, estimate next-word probabilities with diverse statistical language models, and model reading times using recent advances in nonlinear regression. Results support a logarithmic effect of word predictability on processing difficulty, which favors probabilistic inference as a key component of human language processing.


Asunto(s)
Comprensión , Lenguaje , Humanos , Modelos Estadísticos
8.
Proc Natl Acad Sci U S A ; 121(11): e2310766121, 2024 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-38442171

RESUMEN

The neural correlates of sentence production are typically studied using task paradigms that differ considerably from the experience of speaking outside of an experimental setting. In this fMRI study, we aimed to gain a better understanding of syntactic processing in spontaneous production versus naturalistic comprehension in three regions of interest (BA44, BA45, and left posterior middle temporal gyrus). A group of participants (n = 16) was asked to speak about the events of an episode of a TV series in the scanner. Another group of participants (n = 36) listened to the spoken recall of a participant from the first group. To model syntactic processing, we extracted word-by-word metrics of phrase-structure building with a top-down and a bottom-up parser that make different hypotheses about the timing of structure building. While the top-down parser anticipates syntactic structure, sometimes before it is obvious to the listener, the bottom-up parser builds syntactic structure in an integratory way after all of the evidence has been presented. In comprehension, neural activity was found to be better modeled by the bottom-up parser, while in production, it was better modeled by the top-down parser. We additionally modeled structure building in production with two strategies that were developed here to make different predictions about the incrementality of structure building during speaking. We found evidence for highly incremental and anticipatory structure building in production, which was confirmed by a converging analysis of the pausing patterns in speech. Overall, this study shows the feasibility of studying the neural dynamics of spontaneous language production.


Asunto(s)
Benchmarking , Recuerdo Mental , Humanos , Lenguaje , Programas Informáticos , Habla
9.
Proc Natl Acad Sci U S A ; 121(25): e2320066121, 2024 Jun 18.
Artículo en Inglés | MEDLINE | ID: mdl-38861605

RESUMEN

How are the merits of innovative ideas communicated in science? Here, we conduct semantic analyses of grant application success with a focus on scientific promotional language, which may help to convey an innovative idea's originality and significance. Our analysis attempts to surmount the limitations of prior grant studies by examining the full text of tens of thousands of both funded and unfunded grants from three leading public and private funding agencies: the NIH, the NSF, and the Novo Nordisk Foundation, one of the world's largest private science funding foundations. We find a robust association between promotional language and the support and adoption of innovative ideas by funders and other scientists. First, a grant proposal's percentage of promotional language is associated with up to a doubling of the grant's probability of being funded. Second, a grant's promotional language reflects its intrinsic innovativeness. Third, the percentage of promotional language is predictive of the expected citation and productivity impact of publications that are supported by funded grants. Finally, a computer-assisted experiment that manipulates the promotional language in our data demonstrates how promotional language can communicate the merit of ideas through cognitive activation. With the incidence of promotional language in science steeply rising, and the pivotal role of grants in converting promising and aspirational ideas into solutions, our analysis provides empirical evidence that promotional language is associated with effectively communicating the merits of innovative scientific ideas.


Asunto(s)
Lenguaje , Humanos , Ciencia , Organización de la Financiación , Estados Unidos , Apoyo a la Investigación como Asunto , Creatividad
10.
Proc Natl Acad Sci U S A ; 121(24): e2317967121, 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38833474

RESUMEN

Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. This study reveals that such strategies emerged in state-of-the-art LLMs, but were nonexistent in earlier LLMs. We conduct a series of experiments showing that state-of-the-art LLMs are able to understand and induce false beliefs in other agents, that their performance in complex deception scenarios can be amplified utilizing chain-of-thought reasoning, and that eliciting Machiavellianism in LLMs can trigger misaligned deceptive behavior. GPT-4, for instance, exhibits deceptive behavior in simple test scenarios 99.16% of the time (P < 0.001). In complex second-order deception test scenarios where the aim is to mislead someone who expects to be deceived, GPT-4 resorts to deceptive behavior 71.46% of the time (P < 0.001) when augmented with chain-of-thought reasoning. In sum, revealing hitherto unknown machine behavior in LLMs, our study contributes to the nascent field of machine psychology.


Asunto(s)
Decepción , Lenguaje , Humanos , Inteligencia Artificial
11.
Proc Natl Acad Sci U S A ; 121(2): e2306286121, 2024 Jan 09.
Artículo en Inglés | MEDLINE | ID: mdl-38175869

RESUMEN

Adult second language (L2) learning is a challenging enterprise inducing neuroplastic changes in the human brain. However, it remains unclear how the structural language connectome and its subnetworks change during adult L2 learning. The current study investigated longitudinal changes in white matter (WM) language networks in each hemisphere, as well as their interconnection, in a large group of Arabic-speaking adults who learned German intensively for 6 mo. We found a significant increase in WM-connectivity within bilateral temporal-parietal semantic and phonological subnetworks and right temporal-frontal pathways mainly in the second half of the learning period. At the same time, WM-connectivity between the two hemispheres decreased significantly. Crucially, these changes in WM-connectivity are correlated with L2 performance. The observed changes in subnetworks of the two hemispheres suggest a network reconfiguration due to lexical learning. The reduced interhemispheric connectivity may indicate a key role of the corpus callosum in L2 learning by reducing the inhibition of the language-dominant left hemisphere. Our study highlights the dynamic changes within and across hemispheres in adult language-related networks driven by L2 learning.


Asunto(s)
Sustancia Blanca , Adulto , Humanos , Lenguaje , Encéfalo/fisiología , Aprendizaje/fisiología , Semántica , Imagen por Resonancia Magnética
12.
Proc Natl Acad Sci U S A ; 121(18): e2312323121, 2024 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-38621117

RESUMEN

Zebra finches, a species of songbirds, learn to sing by creating an auditory template through the memorization of model songs (sensory learning phase) and subsequently translating these perceptual memories into motor skills (sensorimotor learning phase). It has been traditionally believed that babbling in juvenile birds initiates the sensorimotor phase while the sensory phase of song learning precedes the onset of babbling. However, our findings challenge this notion by demonstrating that testosterone-induced premature babbling actually triggers the onset of the sensory learning phase instead. We reveal that juvenile birds must engage in babbling and self-listening to acquire the tutor song as the template. Notably, the sensory learning of the template in songbirds requires motor vocal activity, reflecting the observation that prelinguistic babbling in humans plays a crucial role in auditory learning for language acquisition.


Asunto(s)
Pinzones , Animales , Humanos , Vocalización Animal , Aprendizaje , Desarrollo del Lenguaje
13.
Proc Natl Acad Sci U S A ; 121(23): e2311425121, 2024 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-38814865

RESUMEN

Theories of language development-informed largely by studies of Western, middleclass infants-have highlighted the language that caregivers direct to children as a key driver of language learning. However, some have argued that language development unfolds similarly across environmental contexts, including those in which childdirected language is scarce. This raises the possibility that children are able to learn from other sources of language in their environments, particularly the language directed to others in their environment. We explore this hypothesis with infants in an indigenous Tseltal-speaking community in Southern Mexico who are rarely spoken to, yet have the opportunity to overhear a great deal of other-directed language by virtue of being carried on their mothers' backs. Adapting a previously established gaze-tracking method for detecting early word knowledge to our field setting, we find that Tseltal infants exhibit implicit knowledge of common nouns (Exp. 1), analogous to their US peers who are frequently spoken to. Moreover, they exhibit comprehension of Tseltal honorific terms that are exclusively used to greet adults in the community (Exp. 2), representing language that could only have been learned through overhearing. In so doing, Tseltal infants demonstrate an ability to discriminate words with similar meanings and perceptually similar referents at an earlier age than has been shown among Western children. Together, these results suggest that for some infants, learning from overhearing may be an important path toward developing language.


Asunto(s)
Comprensión , Desarrollo del Lenguaje , Humanos , Lactante , Femenino , Masculino , Comprensión/fisiología , México , Lenguaje , Vocabulario
14.
Proc Natl Acad Sci U S A ; 121(22): e2316149121, 2024 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-38768342

RESUMEN

Speech impediments are a prominent yet understudied symptom of Parkinson's disease (PD). While the subthalamic nucleus (STN) is an established clinical target for treating motor symptoms, these interventions can lead to further worsening of speech. The interplay between dopaminergic medication, STN circuitry, and their downstream effects on speech in PD is not yet fully understood. Here, we investigate the effect of dopaminergic medication on STN circuitry and probe its association with speech and cognitive functions in PD patients. We found that changes in intrinsic functional connectivity of the STN were associated with alterations in speech functions in PD. Interestingly, this relationship was characterized by altered functional connectivity of the dorsolateral and ventromedial subdivisions of the STN with the language network. Crucially, medication-induced changes in functional connectivity between the STN's dorsolateral subdivision and key regions in the language network, including the left inferior frontal cortex and the left superior temporal gyrus, correlated with alterations on a standardized neuropsychological test requiring oral responses. This relation was not observed in the written version of the same test. Furthermore, changes in functional connectivity between STN and language regions predicted the medication's downstream effects on speech-related cognitive performance. These findings reveal a previously unidentified brain mechanism through which dopaminergic medication influences speech function in PD. Our study sheds light into the subcortical-cortical circuit mechanisms underlying impaired speech control in PD. The insights gained here could inform treatment strategies aimed at mitigating speech deficits in PD and enhancing the quality of life for affected individuals.


Asunto(s)
Lenguaje , Enfermedad de Parkinson , Habla , Núcleo Subtalámico , Humanos , Enfermedad de Parkinson/fisiopatología , Enfermedad de Parkinson/tratamiento farmacológico , Núcleo Subtalámico/fisiopatología , Núcleo Subtalámico/efectos de los fármacos , Masculino , Habla/fisiología , Habla/efectos de los fármacos , Femenino , Persona de Mediana Edad , Anciano , Imagen por Resonancia Magnética , Dopamina/metabolismo , Red Nerviosa/efectos de los fármacos , Red Nerviosa/fisiopatología , Cognición/efectos de los fármacos , Dopaminérgicos/farmacología , Dopaminérgicos/uso terapéutico
15.
Proc Natl Acad Sci U S A ; 121(24): e2318124121, 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38830100

RESUMEN

There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs; this is insufficient for making an informed decision about which LLMs are best to use in an interactive setting, and how that varies by setting. Static assessment therefore limits how we understand language model capabilities. We introduce CheckMate, an adaptable prototype platform for humans to interact with and evaluate LLMs. We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics, with a mixed cohort of participants from undergraduate students to professors of mathematics. We release the resulting interaction and rating dataset, MathConverse. By analyzing MathConverse, we derive a taxonomy of human query behaviors and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness in LLM generations, among other findings. Further, we garner a more granular understanding of GPT-4 mathematical problem-solving through a series of case studies, contributed by experienced mathematicians. We conclude with actionable takeaways for ML practitioners and mathematicians: models that communicate uncertainty, respond well to user corrections, and can provide a concise rationale for their recommendations, may constitute better assistants. Humans should inspect LLM output carefully given their current shortcomings and potential for surprising fallibility.


Asunto(s)
Lenguaje , Matemática , Solución de Problemas , Humanos , Solución de Problemas/fisiología , Estudiantes/psicología
16.
Proc Natl Acad Sci U S A ; 121(22): e2310979121, 2024 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-38781212

RESUMEN

Humans have the highly adaptive ability to learn from others' memories. However, because memories are prone to errors, in order for others' memories to be a valuable source of information, we need to assess their veracity. Previous studies have shown that linguistic information conveyed in self-reported justifications can be used to train a machine-learner to distinguish true from false memories. But can humans also perform this task, and if so, do they do so in the same way the machine-learner does? Participants were presented with justifications corresponding to Hits and False Alarms and were asked to directly assess whether the witness's recognition was correct or incorrect. In addition, participants assessed justifications' recollective qualities: their vividness, specificity, and the degree of confidence they conveyed. Results show that human evaluators can discriminate Hits from False Alarms above chance levels, based on the justifications provided per item. Their performance was on par with the machine learner. Furthermore, through assessment of the perceived recollective qualities of justifications, participants were able to glean more information from the justifications than they used in their own direct decisions and than the machine learner did.


Asunto(s)
Recuerdo Mental , Humanos , Recuerdo Mental/fisiología , Femenino , Masculino , Adulto , Reconocimiento en Psicología/fisiología , Adulto Joven , Memoria/fisiología , Aprendizaje Automático
17.
Proc Natl Acad Sci U S A ; 121(14): e2319112121, 2024 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-38551835

RESUMEN

People want to "feel heard" to perceive that they are understood, validated, and valued. Can AI serve the deeply human function of making others feel heard? Our research addresses two fundamental issues: Can AI generate responses that make human recipients feel heard, and how do human recipients react when they believe the response comes from AI? We conducted an experiment and a follow-up study to disentangle the effects of actual source of a message and the presumed source. We found that AI-generated messages made recipients feel more heard than human-generated messages and that AI was better at detecting emotions. However, recipients felt less heard when they realized that a message came from AI (vs. human). Finally, in a follow-up study where the responses were rated by third-party raters, we found that compared with humans, AI demonstrated superior discipline in offering emotional support, a crucial element in making individuals feel heard, while avoiding excessive practical suggestions, which may be less effective in achieving this goal. Our research underscores the potential and limitations of AI in meeting human psychological needs. These findings suggest that while AI demonstrates enhanced capabilities to provide emotional support, the devaluation of AI responses poses a key challenge for effectively leveraging AI's capabilities.


Asunto(s)
Emociones , Motivación , Humanos , Estudios de Seguimiento , Emociones/fisiología
18.
Proc Natl Acad Sci U S A ; 121(20): e2314091121, 2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38709916

RESUMEN

How we reason about objectivity-whether an assertion has a ground truth-has implications for belief formation on wide-ranging topics. For example, if someone perceives climate change to be a matter of subjective opinion similar to the best movie genre, they may consider empirical claims about climate change as mere opinion and irrelevant to their beliefs. Here, we investigate whether the language employed by journalists might influence the perceived objectivity of news claims. Specifically, we ask whether factive verb framing (e.g., "Scientists know climate change is happening") increases perceived objectivity compared to nonfactive framing (e.g., "Scientists believe [...]"). Across eight studies (N = 2,785), participants read news headlines about unique, noncontroversial topics (studies 1a-b, 2a-b) or a familiar, controversial topic (climate change; studies 3a-b, 4a-b) and rated the truth and objectivity of the headlines' claims. Across all eight studies, when claims were presented as beliefs (e.g., "Tortoise breeders believe tortoises are becoming more popular pets"), people consistently judged those claims as more subjective than claims presented as knowledge (e.g., "Tortoise breeders know…"), as well as claims presented as unattributed generics (e.g., "Tortoises are becoming more popular pets"). Surprisingly, verb framing had relatively little, inconsistent influence over participants' judgments of the truth of claims. These results demonstrate how, apart from shaping whether we believe a claim is true or false, epistemic language in media can influence whether we believe a claim has an objective answer at all.


Asunto(s)
Lenguaje , Humanos , Femenino , Conocimiento , Masculino , Cambio Climático , Adulto , Percepción , Medios de Comunicación de Masas
19.
Proc Natl Acad Sci U S A ; 121(26): e2405840121, 2024 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-38900798

RESUMEN

Proteomics has been revolutionized by large protein language models (PLMs), which learn unsupervised representations from large corpora of sequences. These models are typically fine-tuned in a supervised setting to adapt the model to specific downstream tasks. However, the computational and memory footprint of fine-tuning (FT) large PLMs presents a barrier for many research groups with limited computational resources. Natural language processing has seen a similar explosion in the size of models, where these challenges have been addressed by methods for parameter-efficient fine-tuning (PEFT). In this work, we introduce this paradigm to proteomics through leveraging the parameter-efficient method LoRA and training new models for two important tasks: predicting protein-protein interactions (PPIs) and predicting the symmetry of homooligomer quaternary structures. We show that these approaches are competitive with traditional FT while requiring reduced memory and substantially fewer parameters. We additionally show that for the PPI prediction task, training only the classification head also remains competitive with full FT, using five orders of magnitude fewer parameters, and that each of these methods outperform state-of-the-art PPI prediction methods with substantially reduced compute. We further perform a comprehensive evaluation of the hyperparameter space, demonstrate that PEFT of PLMs is robust to variations in these hyperparameters, and elucidate where best practices for PEFT in proteomics differ from those in natural language processing. All our model adaptation and evaluation code is available open-source at https://github.com/microsoft/peft_proteomics. Thus, we provide a blueprint to democratize the power of PLM adaptation to groups with limited computational resources.


Asunto(s)
Proteómica , Proteómica/métodos , Proteínas/química , Proteínas/metabolismo , Procesamiento de Lenguaje Natural , Mapeo de Interacción de Proteínas/métodos , Biología Computacional/métodos , Humanos , Algoritmos
20.
Proc Natl Acad Sci U S A ; 121(27): e2311887121, 2024 Jul 02.
Artículo en Inglés | MEDLINE | ID: mdl-38913900

RESUMEN

Predicting which proteins interact together from amino acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments (MSAs), such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein families in a differentiable way. We introduce a method called Differentiable Pairing using Alignment-based Language Models (DiffPALM) that solves it by exploiting the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context. MSA Transformer encodes coevolution between functionally or structurally coupled amino acids within protein chains. It also captures inter-chain coevolution, despite being trained on single-chain data. Relying on MSA Transformer without fine-tuning, DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments extracted from ubiquitous prokaryotic protein datasets. It also outperforms an alternative method based on a state-of-the-art protein language model trained on single sequences. Paired alignments of interacting protein sequences are a crucial ingredient of supervised deep learning methods to predict the three-dimensional structure of protein complexes. Starting from sequences paired by DiffPALM substantially improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer. It also achieves competitive performance with using orthology-based pairing.


Asunto(s)
Proteínas , Alineación de Secuencia , Alineación de Secuencia/métodos , Proteínas/química , Proteínas/metabolismo , Secuencia de Aminoácidos , Algoritmos , Análisis de Secuencia de Proteína/métodos , Biología Computacional/métodos , Bases de Datos de Proteínas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA