Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29.793
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Nature ; 631(8021): 610-616, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38961302

RESUMO

From sequences of speech sounds1,2 or letters3, humans can extract rich and nuanced meaning through language. This capacity is essential for human communication. Yet, despite a growing understanding of the brain areas that support linguistic and semantic processing4-12, the derivation of linguistic meaning in neural tissue at the cellular level and over the timescale of action potentials remains largely unknown. Here we recorded from single cells in the left language-dominant prefrontal cortex as participants listened to semantically diverse sentences and naturalistic stories. By tracking their activities during natural speech processing, we discover a fine-scale cortical representation of semantic information by individual neurons. These neurons responded selectively to specific word meanings and reliably distinguished words from nonwords. Moreover, rather than responding to the words as fixed memory representations, their activities were highly dynamic, reflecting the words' meanings based on their specific sentence contexts and independent of their phonetic form. Collectively, we show how these cell ensembles accurately predicted the broad semantic categories of the words as they were heard in real time during speech and how they tracked the sentences in which they appeared. We also show how they encoded the hierarchical structure of these meaning representations and how these representations mapped onto the cell population. Together, these findings reveal a finely detailed cortical organization of semantic representations at the neuron scale in humans and begin to illuminate the cellular-level processing of meaning during language comprehension.


Assuntos
Compreensão , Neurônios , Córtex Pré-Frontal , Semântica , Análise de Célula Única , Percepção da Fala , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Compreensão/fisiologia , Neurônios/fisiologia , Fonética , Córtex Pré-Frontal/fisiologia , Córtex Pré-Frontal/citologia , Percepção da Fala/fisiologia , Narração
2.
Nat Rev Genet ; 24(3): 197-204, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36316396

RESUMO

Research linking genetic differences with human social and behavioural phenotypes has long been controversial. Frequently, debates about the ethical, social and legal implications of this area of research centre on questions about whether studies overtly or covertly perpetuate genetic determinism, genetic essentialism and/or genetic reductionism. Given the prominent role of the '-isms' in scientific discourse and criticism, it is important for there to be consensus and clarity about the meaning of these terms. Here, the author integrates scholarship from psychology, genetics and philosophy of science to provide accessible definitions of genetic determinism, genetic reductionism and genetic essentialism. The author provides linguistic and visual examples of determinism, reductionism and essentialism in science and popular culture, discusses common misconceptions and concludes with recommendations for science communication.


Assuntos
Determinismo Genético , Semântica , Humanos
3.
Nature ; 623(7989): 1070-1078, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37968394

RESUMO

Three billion years of evolution has produced a tremendous diversity of protein molecules1, but the full potential of proteins is likely to be much greater. Accessing this potential has been challenging for both computation and experiments because the space of possible protein molecules is much larger than the space of those likely to have functions. Here we introduce Chroma, a generative model for proteins and protein complexes that can directly sample novel protein structures and sequences, and that can be conditioned to steer the generative process towards desired properties and functions. To enable this, we introduce a diffusion process that respects the conformational statistics of polymer ensembles, an efficient neural architecture for molecular systems that enables long-range reasoning with sub-quadratic scaling, layers for efficiently synthesizing three-dimensional structures of proteins from predicted inter-residue geometries and a general low-temperature sampling algorithm for diffusion models. Chroma achieves protein design as Bayesian inference under external constraints, which can involve symmetries, substructure, shape, semantics and even natural-language prompts. The experimental characterization of 310 proteins shows that sampling from Chroma results in proteins that are highly expressed, fold and have favourable biophysical properties. The crystal structures of two designed proteins exhibit atomistic agreement with Chroma samples (a backbone root-mean-square deviation of around 1.0 Å). With this unified approach to protein design, we hope to accelerate the programming of protein matter to benefit human health, materials science and synthetic biology.


Assuntos
Algoritmos , Simulação por Computador , Conformação Proteica , Proteínas , Humanos , Teorema de Bayes , Evolução Molecular Direcionada , Aprendizado de Máquina , Modelos Moleculares , Dobramento de Proteína , Proteínas/química , Proteínas/metabolismo , Semântica , Biologia Sintética/métodos , Biologia Sintética/tendências
4.
Nat Rev Neurosci ; 24(2): 113-128, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36460920

RESUMO

Understanding what someone says requires relating words in a sentence to one another as instructed by the grammatical rules of a language. In recent years, the neurophysiological basis for this process has become a prominent topic of discussion in cognitive neuroscience. Current proposals about the neural mechanisms of syntactic structure building converge on a key role for neural oscillations in this process, but they differ in terms of the exact function that is assigned to them. In this Perspective, we discuss two proposed functions for neural oscillations - chunking and multiscale information integration - and evaluate their merits and limitations taking into account a fundamentally hierarchical nature of syntactic representations in natural languages. We highlight insights that provide a tangible starting point for a neurocognitive model of syntactic structure building.


Assuntos
Idioma , Memória , Humanos , Semântica
5.
Nat Methods ; 21(2): 195-212, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38347141

RESUMO

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Aprendizado de Máquina , Semântica
6.
PLoS Biol ; 22(5): e3002622, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38814982

RESUMO

Combinatoric linguistic operations underpin human language processes, but how meaning is composed and refined in the mind of the reader is not well understood. We address this puzzle by exploiting the ubiquitous function of negation. We track the online effects of negation ("not") and intensifiers ("really") on the representation of scalar adjectives (e.g., "good") in parametrically designed behavioral and neurophysiological (MEG) experiments. The behavioral data show that participants first interpret negated adjectives as affirmative and later modify their interpretation towards, but never exactly as, the opposite meaning. Decoding analyses of neural activity further reveal significant above chance decoding accuracy for negated adjectives within 600 ms from adjective onset, suggesting that negation does not invert the representation of adjectives (i.e., "not bad" represented as "good"); furthermore, decoding accuracy for negated adjectives is found to be significantly lower than that for affirmative adjectives. Overall, these results suggest that negation mitigates rather than inverts the neural representations of adjectives. This putative suppression mechanism of negation is supported by increased synchronization of beta-band neural activity in sensorimotor areas. The analysis of negation provides a steppingstone to understand how the human brain represents changes of meaning over time.


Assuntos
Idioma , Humanos , Feminino , Masculino , Adulto , Adulto Jovem , Encéfalo/fisiologia , Magnetoencefalografia/métodos , Semântica , Linguística/métodos
7.
PLoS Genet ; 20(2): e1010657, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38377104

RESUMO

A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient-ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual's alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled "ghost" population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method's success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.


Assuntos
Genética Populacional , Semântica , Humanos , Alelos , Genômica , Evolução Biológica
8.
Proc Natl Acad Sci U S A ; 121(30): e2315438121, 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-39028693

RESUMO

There is evidence from both behavior and brain activity that the way information is structured, through the use of focus, can up-regulate processing of focused constituents, likely to give prominence to the relevant aspects of the input. This is hypothesized to be universal, regardless of the different ways in which languages encode focus. In order to test this universalist hypothesis, we need to go beyond the more familiar linguistic strategies for marking focus, such as by means of intonation or specific syntactic structures (e.g., it-clefts). Therefore, in this study, we examine Makhuwa-Enahara, a Bantu language spoken in northern Mozambique, which uniquely marks focus through verbal conjugation. The participants were presented with sentences that consisted of either a semantically anomalous constituent or a semantically nonanomalous constituent. Moreover, focus on this particular constituent could be either present or absent. We observed a consistent pattern: Focused information generated a more negative N400 response than the same information in nonfocus position. This demonstrates that regardless of how focus is marked, its consequence seems to result in an upregulation of processing of information that is in focus.


Assuntos
Idioma , Humanos , Feminino , Masculino , Adulto , Moçambique , Eletroencefalografia , Semântica , Encéfalo/fisiologia , Adulto Jovem , Linguística , Potenciais Evocados/fisiologia
9.
Proc Natl Acad Sci U S A ; 121(2): e2306286121, 2024 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-38175869

RESUMO

Adult second language (L2) learning is a challenging enterprise inducing neuroplastic changes in the human brain. However, it remains unclear how the structural language connectome and its subnetworks change during adult L2 learning. The current study investigated longitudinal changes in white matter (WM) language networks in each hemisphere, as well as their interconnection, in a large group of Arabic-speaking adults who learned German intensively for 6 mo. We found a significant increase in WM-connectivity within bilateral temporal-parietal semantic and phonological subnetworks and right temporal-frontal pathways mainly in the second half of the learning period. At the same time, WM-connectivity between the two hemispheres decreased significantly. Crucially, these changes in WM-connectivity are correlated with L2 performance. The observed changes in subnetworks of the two hemispheres suggest a network reconfiguration due to lexical learning. The reduced interhemispheric connectivity may indicate a key role of the corpus callosum in L2 learning by reducing the inhibition of the language-dominant left hemisphere. Our study highlights the dynamic changes within and across hemispheres in adult language-related networks driven by L2 learning.


Assuntos
Substância Branca , Adulto , Humanos , Idioma , Encéfalo/fisiologia , Aprendizagem/fisiologia , Semântica , Imageamento por Ressonância Magnética
10.
Nat Methods ; 20(4): 569-579, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36997816

RESUMO

The ability to quantify structural changes of the endoplasmic reticulum (ER) is crucial for understanding the structure and function of this organelle. However, the rapid movement and complex topology of ER networks make this challenging. Here, we construct a state-of-the-art semantic segmentation method that we call ERnet for the automatic classification of sheet and tubular ER domains inside individual cells. Data are skeletonized and represented by connectivity graphs, enabling precise and efficient quantification of network connectivity. ERnet generates metrics on topology and integrity of ER structures and quantifies structural change in response to genetic or metabolic manipulation. We validate ERnet using data obtained by various ER-imaging methods from different cell types as well as ground truth images of synthetic ER structures. ERnet can be deployed in an automatic high-throughput and unbiased fashion and identifies subtle changes in ER phenotypes that may inform on disease progression and response to therapy.


Assuntos
Retículo Endoplasmático , Semântica , Retículo Endoplasmático/metabolismo
11.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38605639

RESUMO

The accurate identification of disease-associated genes is crucial for understanding the molecular mechanisms underlying various diseases. Most current methods focus on constructing biological networks and utilizing machine learning, particularly deep learning, to identify disease genes. However, these methods overlook complex relations among entities in biological knowledge graphs. Such information has been successfully applied in other areas of life science research, demonstrating their effectiveness. Knowledge graph embedding methods can learn the semantic information of different relations within the knowledge graphs. Nonetheless, the performance of existing representation learning techniques, when applied to domain-specific biological data, remains suboptimal. To solve these problems, we construct a biological knowledge graph centered on diseases and genes, and develop an end-to-end knowledge graph completion framework for disease gene prediction using interactional tensor decomposition named KDGene. KDGene incorporates an interaction module that bridges entity and relation embeddings within tensor decomposition, aiming to improve the representation of semantically similar concepts in specific domains and enhance the ability to accurately predict disease genes. Experimental results show that KDGene significantly outperforms state-of-the-art algorithms, whether existing disease gene prediction methods or knowledge graph embedding methods for general domains. Moreover, the comprehensive biological analysis of the predicted results further validates KDGene's capability to accurately identify new candidate genes. This work proposes a scalable knowledge graph completion framework to identify disease candidate genes, from which the results are promising to provide valuable references for further wet experiments. Data and source codes are available at https://github.com/2020MEAI/KDGene.


Assuntos
Disciplinas das Ciências Biológicas , Reconhecimento Automatizado de Padrão , Algoritmos , Aprendizado de Máquina , Semântica
12.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39154194

RESUMO

Understanding the genetic basis of disease is a fundamental aspect of medical research, as genes are the classic units of heredity and play a crucial role in biological function. Identifying associations between genes and diseases is critical for diagnosis, prevention, prognosis, and drug development. Genes that encode proteins with similar sequences are often implicated in related diseases, as proteins causing identical or similar diseases tend to show limited variation in their sequences. Predicting gene-disease association (GDA) requires time-consuming and expensive experiments on a large number of potential candidate genes. Although methods have been proposed to predict associations between genes and diseases using traditional machine learning algorithms and graph neural networks, these approaches struggle to capture the deep semantic information within the genes and diseases and are dependent on training data. To alleviate this issue, we propose a novel GDA prediction model named FusionGDA, which utilizes a pre-training phase with a fusion module to enrich the gene and disease semantic representations encoded by pre-trained language models. Multi-modal representations are generated by the fusion module, which includes rich semantic information about two heterogeneous biomedical entities: protein sequences and disease descriptions. Subsequently, the pooling aggregation strategy is adopted to compress the dimensions of the multi-modal representation. In addition, FusionGDA employs a pre-training phase leveraging a contrastive learning loss to extract potential gene and disease features by training on a large public GDA dataset. To rigorously evaluate the effectiveness of the FusionGDA model, we conduct comprehensive experiments on five datasets and compare our proposed model with five competitive baseline models on the DisGeNet-Eval dataset. Notably, our case study further demonstrates the ability of FusionGDA to discover hidden associations effectively. The complete code and datasets of our experiments are available at https://github.com/ZhaohanM/FusionGDA.


Assuntos
Aprendizado de Máquina , Humanos , Biologia Computacional/métodos , Predisposição Genética para Doença , Semântica , Algoritmos , Estudos de Associação Genética , Redes Neurais de Computação
13.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38557678

RESUMO

Disease ontologies facilitate the semantic organization and representation of domain-specific knowledge. In the case of prostate cancer (PCa), large volumes of research results and clinical data have been accumulated and needed to be standardized for sharing and translational researches. A formal representation of PCa-associated knowledge will be essential to the diverse data standardization, data sharing and the future knowledge graph extraction, deep phenotyping and explainable artificial intelligence developing. In this study, we constructed an updated PCa ontology (PCAO2) based on the ontology development life cycle. An online information retrieval system was designed to ensure the usability of the ontology. The PCAO2 with a subclass-based taxonomic hierarchy covers the major biomedical concepts for PCa-associated genotypic, phenotypic and lifestyle data. The current version of the PCAO2 contains 633 concepts organized under three biomedical viewpoints, namely, epidemiology, diagnosis and treatment. These concepts are enriched by the addition of definition, synonym, relationship and reference. For the precision diagnosis and treatment, the PCa-associated genes and lifestyles are integrated in the viewpoint of epidemiological aspects of PCa. PCAO2 provides a standardized and systematized semantic framework for studying large amounts of heterogeneous PCa data and knowledge, which can be further, edited and enriched by the scientific community. The PCAO2 is freely available at https://bioportal.bioontology.org/ontologies/PCAO, http://pcaontology.net/ and http://pcaontology.net/mobile/.


Assuntos
Ontologias Biológicas , Neoplasias da Próstata , Humanos , Masculino , Inteligência Artificial , Semântica , Neoplasias da Próstata/genética
14.
Nucleic Acids Res ; 52(D1): D1305-D1314, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37953304

RESUMO

In 2003, the Human Disease Ontology (DO, https://disease-ontology.org/) was established at Northwestern University. In the intervening 20 years, the DO has expanded to become a highly-utilized disease knowledge resource. Serving as the nomenclature and classification standard for human diseases, the DO provides a stable, etiology-based structure integrating mechanistic drivers of human disease. Over the past two decades the DO has grown from a collection of clinical vocabularies, into an expertly curated semantic resource of over 11300 common and rare diseases linking disease concepts through more than 37000 vocabulary cross mappings (v2023-08-08). Here, we introduce the recently launched DO Knowledgebase (DO-KB), which expands the DO's representation of the diseaseome and enhances the findability, accessibility, interoperability and reusability (FAIR) of disease data through a new SPARQL service and new Faceted Search Interface. The DO-KB is an integrated data system, built upon the DO's semantic disease knowledge backbone, with resources that expose and connect the DO's semantic knowledge with disease-related data across Open Linked Data resources. This update includes descriptions of efforts to assess the DO's global impact and improvements to data quality and content, with emphasis on changes in the last two years.


Assuntos
Ecossistema , Bases de Conhecimento , Humanos , Doenças Raras , Semântica , Fatores de Tempo
15.
Nucleic Acids Res ; 52(W1): W540-W546, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38572754

RESUMO

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.


Assuntos
PubMed , Inteligência Artificial , Humanos , Software , Mineração de Dados/métodos , Semântica , Internet
16.
Proc Natl Acad Sci U S A ; 120(1): e2209153119, 2023 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-36574655

RESUMO

In the second year of life, infants begin to rapidly acquire the lexicon of their native language. A key learning mechanism underlying this acceleration is syntactic bootstrapping: the use of hidden cues in grammar to facilitate vocabulary learning. How infants forge the syntactic-semantic links that underlie this mechanism, however, remains speculative. A hurdle for theories is identifying computationally light strategies that have high precision within the complexity of the linguistic signal. Here, we presented 20-mo-old infants with novel grammatical elements in a complex natural language environment and measured their resultant vocabulary expansion. We found that infants can learn and exploit a natural language syntactic-semantic link in less than 30 min. The rapid speed of acquisition of a new syntactic bootstrap indicates that even emergent syntactic-semantic links can accelerate language learning. The results suggest that infants employ a cognitive network of efficient learning strategies to self-supervise language development.


Assuntos
Aprendizagem , Semântica , Humanos , Lactente , Idioma , Vocabulário , Linguística , Desenvolvimento da Linguagem
17.
Proc Natl Acad Sci U S A ; 120(39): e2220593120, 2023 09 26.
Artigo em Inglês | MEDLINE | ID: mdl-37725652

RESUMO

I apply a recently emerging perspective on the complexity of action selection, the rate-distortion theory of control, to provide a computational-level model of errors and difficulties in human language production, which is grounded in information theory and control theory. Language production is cast as the sequential selection of actions to achieve a communicative goal subject to a capacity constraint on cognitive control. In a series of calculations, simulations, corpus analyses, and comparisons to experimental data, I show that the model directly predicts some of the major known qualitative and quantitative phenomena in language production, including semantic interference and predictability effects in word choice; accessibility-based ("easy-first") production preferences in word order alternations; and the existence and distribution of disfluencies including filled pauses, corrections, and false starts. I connect the rate-distortion view to existing models of human language production, to probabilistic models of semantics and pragmatics, and to proposals for controlled language generation in the machine learning and reinforcement learning literature.


Assuntos
Idioma , Semântica , Humanos , Comunicação , Teoria da Informação , Aprendizado de Máquina
18.
Proc Natl Acad Sci U S A ; 120(51): e2300986120, 2023 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-38079546

RESUMO

How does meaning vary across the world's languages? Scholars recognize the existence of substantial variability within specific domains, ranging from nature and color to kinship. The emergence of large language models enables a systems-level approach that directly characterizes this variability through comparison of word organization across semantic domains. Here, we show that meanings across languages manifest lower variability within semantic domains and greater variability between them, using models trained on both 1) large corpora of native language text comprising Wikipedia articles in 35 languages and also 2) Test of English as a Foreign Language (TOEFL) essays written by 38,500 speakers from the same native languages, which cluster into semantic domains. Concrete meanings vary less across languages than abstract meanings, but all vary with geographical, environmental, and cultural distance. By simultaneously examining local similarity and global difference, we harmonize these findings and provide a description of general principles that govern variability in semantic space across languages. In this way, the structure of a speaker's semantic space influences the comparisons cognitively salient to them, as shaped by their native language, and suggests that even successful bilingual communicators likely think with "semantic accents" driven by associations from their native language while writing English. These findings have dramatic implications for language education, cross-cultural communication, and literal translations, which are impossible not because the objects of reference are uncertain, but because associations, metaphors, and narratives interlink meanings in different, predictable ways from one language to another.


Assuntos
Idioma , Semântica , Humanos , Comunicação , Desenvolvimento da Linguagem , Narração
19.
Proc Natl Acad Sci U S A ; 120(52): e2305414120, 2023 Dec 26.
Artigo em Inglês | MEDLINE | ID: mdl-38134198

RESUMO

Human migration and mobility drives major societal phenomena including epidemics, economies, innovation, and the diffusion of ideas. Although human mobility and migration have been heavily constrained by geographic distance throughout the history, advances, and globalization are making other factors such as language and culture increasingly more important. Advances in neural embedding models, originally designed for natural language, provide an opportunity to tame this complexity and open new avenues for the study of migration. Here, we demonstrate the ability of the model word2vec to encode nuanced relationships between discrete locations from migration trajectories, producing an accurate, dense, continuous, and meaningful vector-space representation. The resulting representation provides a functional distance between locations, as well as a "digital double" that can be distributed, re-used, and itself interrogated to understand the many dimensions of migration. We show that the unique power of word2vec to encode migration patterns stems from its mathematical equivalence with the gravity model of mobility. Focusing on the case of scientific migration, we apply word2vec to a database of three million migration trajectories of scientists derived from the affiliations listed on their publication records. Using techniques that leverage its semantic structure, we demonstrate that embeddings can learn the rich structure that underpins scientific migration, such as cultural, linguistic, and prestige relationships at multiple levels of granularity. Our results provide a theoretical foundation and methodological framework for using neural embeddings to represent and understand migration both within and beyond science.


Assuntos
Idioma , Semântica , Humanos , Aprendizado de Máquina , Aprendizagem , Processamento de Linguagem Natural
20.
Proc Natl Acad Sci U S A ; 120(42): e2309688120, 2023 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-37819984

RESUMO

Whether supervised or unsupervised, human and machine learning is usually characterized as event-based. However, learning may also proceed by systems alignment in which mappings are inferred between entire systems, such as visual and linguistic systems. Systems alignment is possible because items that share similar visual contexts, such as a car and a truck, will also tend to share similar linguistic contexts. Because of the mirrored similarity relationships across systems, the visual and linguistic systems can be aligned at some later time absent either input. In a series of simulation studies, we considered whether children's early concepts support systems alignment. We found that children's early concepts are close to optimal for inferring novel concepts through systems alignment, enabling agents to correctly infer more than 85% of visual-word mappings absent supervision. One possible explanation for why children's early concepts support systems alignment is that they are distinguished structurally by their dense semantic neighborhoods. Artificial agents using these structural features to select concepts proved highly effective, both in environments mirroring children's conceptual world and those that exclude the concepts that children commonly acquire. For children, systems alignment and event-based learning likely complement one another. Likewise, artificial systems can benefit from incorporating these developmental principles.


Assuntos
Linguística , Semântica , Humanos , Criança , Simulação por Computador , Características de Residência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA