Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 8.584
Filtrar
Más filtros

Intervalo de año de publicación
1.
Annu Rev Immunol ; 38: 727-757, 2020 04 26.
Artículo en Inglés | MEDLINE | ID: mdl-32075461

RESUMEN

Immune cells are characterized by diversity, specificity, plasticity, and adaptability-properties that enable them to contribute to homeostasis and respond specifically and dynamically to the many threats encountered by the body. Single-cell technologies, including the assessment of transcriptomics, genomics, and proteomics at the level of individual cells, are ideally suited to studying these properties of immune cells. In this review we discuss the benefits of adopting single-cell approaches in studying underappreciated qualities of immune cells and highlight examples where these technologies have been critical to advancing our understanding of the immune system in health and disease.


Asunto(s)
Sistema Inmunológico/inmunología , Sistema Inmunológico/metabolismo , Inmunidad , Análisis de la Célula Individual , Animales , Biomarcadores , Susceptibilidad a Enfermedades , Homeostasis , Humanos , Sistema Inmunológico/citología , Imagen Molecular , Análisis de la Célula Individual/métodos
2.
Cell ; 186(22): 4885-4897.e14, 2023 10 26.
Artículo en Inglés | MEDLINE | ID: mdl-37804832

RESUMEN

Human reasoning depends on reusing pieces of information by putting them together in new ways. However, very little is known about how compositional computation is implemented in the brain. Here, we ask participants to solve a series of problems that each require constructing a whole from a set of elements. With fMRI, we find that representations of novel constructed objects in the frontal cortex and hippocampus are relational and compositional. With MEG, we find that replay assembles elements into compounds, with each replay sequence constituting a hypothesis about a possible configuration of elements. The content of sequences evolves as participants solve each puzzle, progressing from predictable to uncertain elements and gradually converging on the correct configuration. Together, these results suggest a computational bridge between apparently distinct functions of hippocampal-prefrontal circuitry and a role for generative replay in compositional inference and hypothesis testing.


Asunto(s)
Hipocampo , Corteza Prefrontal , Humanos , Encéfalo , Lóbulo Frontal , Hipocampo/fisiología , Imagen por Resonancia Magnética/métodos , Vías Nerviosas , Corteza Prefrontal/fisiología
3.
Cell ; 186(3): 497-512.e23, 2023 02 02.
Artículo en Inglés | MEDLINE | ID: mdl-36657443

RESUMEN

The human embryo breaks symmetry to form the anterior-posterior axis of the body. As the embryo elongates along this axis, progenitors in the tail bud give rise to tissues that generate spinal cord, skeleton, and musculature. This raises the question of how the embryo achieves axial elongation and patterning. While ethics necessitate in vitro studies, the variability of organoid systems has hindered mechanistic insights. Here, we developed a bioengineering and machine learning framework that optimizes organoid symmetry breaking by tuning their spatial coupling. This framework enabled reproducible generation of axially elongating organoids, each possessing a tail bud and neural tube. We discovered that an excitable system composed of WNT/FGF signaling drives elongation by inducing a neuromesodermal progenitor-like signaling center. We discovered that instabilities in the excitable system are suppressed by secreted WNT inhibitors. Absence of these inhibitors led to ectopic tail buds and branches. Our results identify mechanisms governing stable human axial elongation.


Asunto(s)
Tipificación del Cuerpo , Mesodermo , Humanos , Vía de Señalización Wnt , Embrión de Mamíferos , Organoides
4.
Cell ; 185(11): 1842-1859.e18, 2022 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-35561686

RESUMEN

The precise genetic origins of the first Neolithic farming populations in Europe and Southwest Asia, as well as the processes and the timing of their differentiation, remain largely unknown. Demogenomic modeling of high-quality ancient genomes reveals that the early farmers of Anatolia and Europe emerged from a multiphase mixing of a Southwest Asian population with a strongly bottlenecked western hunter-gatherer population after the last glacial maximum. Moreover, the ancestors of the first farmers of Europe and Anatolia went through a period of extreme genetic drift during their westward range expansion, contributing highly to their genetic distinctiveness. This modeling elucidates the demographic processes at the root of the Neolithic transition and leads to a spatial interpretation of the population history of Southwest Asia and Europe during the late Pleistocene and early Holocene.


Asunto(s)
Agricultores , Genoma , Agricultura , ADN Mitocondrial/genética , Europa (Continente) , Flujo Genético , Genómica , Historia Antigua , Migración Humana , Humanos
5.
Cell ; 185(10): 1646-1660.e18, 2022 05 12.
Artículo en Inglés | MEDLINE | ID: mdl-35447073

RESUMEN

Incomplete lineage sorting (ILS) makes ancestral genetic polymorphisms persist during rapid speciation events, inducing incongruences between gene trees and species trees. ILS has complicated phylogenetic inference in many lineages, including hominids. However, we lack empirical evidence that ILS leads to incongruent phenotypic variation. Here, we performed phylogenomic analyses to show that the South American monito del monte is the sister lineage of all Australian marsupials, although over 31% of its genome is closer to the Diprotodontia than to other Australian groups due to ILS during ancient radiation. Pervasive conflicting phylogenetic signals across the whole genome are consistent with some of the morphological variation among extant marsupials. We detected hundreds of genes that experienced stochastic fixation during ILS, encoding the same amino acids in non-sister species. Using functional experiments, we confirm how ILS may have directly contributed to hemiplasy in morphological traits that were established during rapid marsupial speciation ca. 60 mya.


Asunto(s)
Marsupiales , Animales , Australia , Evolución Molecular , Especiación Genética , Genoma , Marsupiales/genética , Fenotipo , Filogenia
6.
Cell ; 185(25): 4703-4716.e16, 2022 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-36455558

RESUMEN

We report genome-wide data from 33 Ashkenazi Jews (AJ), dated to the 14th century, obtained following a salvage excavation at the medieval Jewish cemetery of Erfurt, Germany. The Erfurt individuals are genetically similar to modern AJ, but they show more variability in Eastern European-related ancestry than modern AJ. A third of the Erfurt individuals carried a mitochondrial lineage common in modern AJ and eight carried pathogenic variants known to affect AJ today. These observations, together with high levels of runs of homozygosity, suggest that the Erfurt community had already experienced the major reduction in size that affected modern AJ. The Erfurt bottleneck was more severe, implying substructure in medieval AJ. Overall, our results suggest that the AJ founder event and the acquisition of the main sources of ancestry pre-dated the 14th century and highlight late medieval genetic heterogeneity no longer present in modern AJ.


Asunto(s)
Judíos , Población Blanca , Humanos , Judíos/genética , Genética de Población , Genoma Humano
7.
Cell ; 185(24): 4604-4620.e32, 2022 11 23.
Artículo en Inglés | MEDLINE | ID: mdl-36423582

RESUMEN

Natural and induced somatic mutations that accumulate in the genome during development record the phylogenetic relationships of cells; whether these lineage barcodes capture the complex dynamics of progenitor states remains unclear. We introduce quantitative fate mapping, an approach to reconstruct the hierarchy, commitment times, population sizes, and commitment biases of intermediate progenitor states during development based on a time-scaled phylogeny of their descendants. To reconstruct time-scaled phylogenies from lineage barcodes, we introduce Phylotime, a scalable maximum likelihood clustering approach based on a general barcoding mutagenesis model. We validate these approaches using realistic in silico and in vitro barcoding experiments. We further establish criteria for the number of cells that must be analyzed for robust quantitative fate mapping and a progenitor state coverage statistic to assess the robustness. This work demonstrates how lineage barcodes, natural or synthetic, enable analyzing progenitor fate and dynamics long after embryonic development in any organism.


Asunto(s)
Desarrollo Embrionario , Linaje de la Célula/genética , Estudios Retrospectivos , Filogenia , Mutagénesis
8.
Cell ; 184(16): 4315-4328.e17, 2021 08 05.
Artículo en Inglés | MEDLINE | ID: mdl-34197734

RESUMEN

An ability to build structured mental maps of the world underpins our capacity to imagine relationships between objects that extend beyond experience. In rodents, such representations are supported by sequential place cell reactivations during rest, known as replay. Schizophrenia is proposed to reflect a compromise in structured mental representations, with animal models reporting abnormalities in hippocampal replay and associated ripple activity during rest. Here, utilizing magnetoencephalography (MEG), we tasked patients with schizophrenia and control participants to infer unobserved relationships between objects by reorganizing visual experiences containing these objects. During a post-task rest session, controls exhibited fast spontaneous neural reactivation of presented objects that replayed inferred relationships. Replay was coincident with increased ripple power in hippocampus. Patients showed both reduced replay and augmented ripple power relative to controls, convergent with findings in animal models. These abnormalities are linked to impairments in behavioral acquisition and subsequent neural representation of task structure.


Asunto(s)
Aprendizaje , Neuronas/patología , Esquizofrenia/patología , Esquizofrenia/fisiopatología , Ritmo alfa/fisiología , Conducta , Mapeo Encefálico , Femenino , Hipocampo/fisiopatología , Humanos , Magnetoencefalografía , Masculino , Modelos Biológicos , Análisis y Desempeño de Tareas
9.
Cell ; 184(11): 2825-2842.e22, 2021 05 27.
Artículo en Inglés | MEDLINE | ID: mdl-33932341

RESUMEN

Mouse embryonic development is a canonical model system for studying mammalian cell fate acquisition. Recently, single-cell atlases comprehensively charted embryonic transcriptional landscapes, yet inference of the coordinated dynamics of cells over such atlases remains challenging. Here, we introduce a temporal model for mouse gastrulation, consisting of data from 153 individually sampled embryos spanning 36 h of molecular diversification. Using algorithms and precise timing, we infer differentiation flows and lineage specification dynamics over the embryonic transcriptional manifold. Rapid transcriptional bifurcations characterize the commitment of early specialized node and blood cells. However, for most lineages, we observe combinatorial multi-furcation dynamics rather than hierarchical transcriptional transitions. In the mesoderm, dozens of transcription factors combinatorially regulate multifurcations, as we exemplify using time-matched chimeric embryos of Foxc1/Foxc2 mutants. Our study rejects the notion of differentiation being governed by a series of binary choices, providing an alternative quantitative model for cell fate acquisition.


Asunto(s)
Desarrollo Embrionario/fisiología , Gastrulación/fisiología , Animales , Diferenciación Celular , Linaje de la Célula , Embrión de Mamíferos/citología , Desarrollo Embrionario/genética , Femenino , Expresión Génica , Ratones/embriología , Ratones Endogámicos C57BL , Células Madre Embrionarias de Ratones , Embarazo , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos
10.
Cell ; 184(11): 2988-3005.e16, 2021 05 27.
Artículo en Inglés | MEDLINE | ID: mdl-34019793

RESUMEN

Clear cell renal carcinoma (ccRCC) is a heterogeneous disease with a variable post-surgical course. To assemble a comprehensive ccRCC tumor microenvironment (TME) atlas, we performed single-cell RNA sequencing (scRNA-seq) of hematopoietic and non-hematopoietic subpopulations from tumor and tumor-adjacent tissue of treatment-naive ccRCC resections. We leveraged the VIPER algorithm to quantitate single-cell protein activity and validated this approach by comparison to flow cytometry. The analysis identified key TME subpopulations, as well as their master regulators and candidate cell-cell interactions, revealing clinically relevant populations, undetectable by gene-expression analysis. Specifically, we uncovered a tumor-specific macrophage subpopulation characterized by upregulation of TREM2/APOE/C1Q, validated by spatially resolved, quantitative multispectral immunofluorescence. In a large clinical validation cohort, these markers were significantly enriched in tumors from patients who recurred following surgery. The study thus identifies TREM2/APOE/C1Q-positive macrophage infiltration as a potential prognostic biomarker for ccRCC recurrence, as well as a candidate therapeutic target.


Asunto(s)
Carcinoma de Células Renales/metabolismo , Recurrencia Local de Neoplasia/genética , Macrófagos Asociados a Tumores/metabolismo , Adulto , Apolipoproteínas E/genética , Apolipoproteínas E/metabolismo , Biomarcadores de Tumor/genética , Carcinoma de Células Renales/genética , Carcinoma de Células Renales/patología , Estudios de Cohortes , Femenino , Expresión Génica/genética , Regulación Neoplásica de la Expresión Génica/genética , Humanos , Riñón/metabolismo , Neoplasias Renales/patología , Linfocitos Infiltrantes de Tumor/patología , Macrófagos/metabolismo , Masculino , Glicoproteínas de Membrana/genética , Glicoproteínas de Membrana/metabolismo , Persona de Mediana Edad , Recurrencia Local de Neoplasia/metabolismo , Pronóstico , Receptores de Complemento/genética , Receptores de Complemento/metabolismo , Receptores Inmunológicos/genética , Receptores Inmunológicos/metabolismo , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Microambiente Tumoral , Macrófagos Asociados a Tumores/fisiología
11.
Cell ; 183(1): 228-243.e21, 2020 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-32946810

RESUMEN

Every day we make decisions critical for adaptation and survival. We repeat actions with known consequences. But we also draw on loosely related events to infer and imagine the outcome of entirely novel choices. These inferential decisions are thought to engage a number of brain regions; however, the underlying neuronal computation remains unknown. Here, we use a multi-day cross-species approach in humans and mice to report the functional anatomy and neuronal computation underlying inferential decisions. We show that during successful inference, the mammalian brain uses a hippocampal prospective code to forecast temporally structured learned associations. Moreover, during resting behavior, coactivation of hippocampal cells in sharp-wave/ripples represent inferred relationships that include reward, thereby "joining-the-dots" between events that have not been observed together but lead to profitable outcomes. Computing mnemonic links in this manner may provide an important mechanism to build a cognitive map that stretches beyond direct experience, thus supporting flexible behavior.


Asunto(s)
Toma de Decisiones/fisiología , Red Nerviosa/fisiología , Pensamiento/fisiología , Animales , Encéfalo/fisiología , Femenino , Hipocampo/metabolismo , Hipocampo/fisiología , Humanos , Masculino , Memoria/fisiología , Ratones , Ratones Endogámicos C57BL , Modelos Neurológicos , Neuronas/metabolismo , Neuronas/fisiología , Estudios Prospectivos , Adulto Joven
12.
Cell ; 181(5): 1146-1157.e11, 2020 05 28.
Artículo en Inglés | MEDLINE | ID: mdl-32470400

RESUMEN

We report genome-wide DNA data for 73 individuals from five archaeological sites across the Bronze and Iron Ages Southern Levant. These individuals, who share the "Canaanite" material culture, can be modeled as descending from two sources: (1) earlier local Neolithic populations and (2) populations related to the Chalcolithic Zagros or the Bronze Age Caucasus. The non-local contribution increased over time, as evinced by three outliers who can be modeled as descendants of recent migrants. We show evidence that different "Canaanite" groups genetically resemble each other more than other populations. We find that Levant-related modern populations typically have substantial ancestry coming from populations related to the Chalcolithic Zagros and the Bronze Age Southern Levant. These groups also harbor ancestry from sources we cannot fully model with the available data, highlighting the critical role of post-Bronze-Age migrations into the region over the past 3,000 years.


Asunto(s)
ADN Antiguo/análisis , Etnicidad/genética , Flujo Génico/genética , Arqueología/métodos , ADN Mitocondrial/genética , Etnicidad/historia , Flujo Génico/fisiología , Variación Genética/genética , Genética de Población/métodos , Genoma Humano/genética , Genómica/métodos , Haplotipos , Historia Antigua , Migración Humana/historia , Humanos , Región Mediterránea , Medio Oriente , Análisis de Secuencia de ADN
13.
Cell ; 178(3): 640-652.e14, 2019 07 25.
Artículo en Inglés | MEDLINE | ID: mdl-31280961

RESUMEN

Knowledge abstracted from previous experiences can be transferred to aid new learning. Here, we asked whether such abstract knowledge immediately guides the replay of new experiences. We first trained participants on a rule defining an ordering of objects and then presented a novel set of objects in a scrambled order. Across two studies, we observed that representations of these novel objects were reactivated during a subsequent rest. As in rodents, human "replay" events occurred in sequences accelerated in time, compared to actual experience, and reversed their direction after a reward. Notably, replay did not simply recapitulate visual experience, but followed instead a sequence implied by learned abstract knowledge. Furthermore, each replay contained more than sensory representations of the relevant objects. A sensory code of object representations was preceded 50 ms by a code factorized into sequence position and sequence identity. We argue that this factorized representation facilitates the generalization of a previously learned structure to new objects.


Asunto(s)
Aprendizaje , Memoria , Potenciales de Acción , Adulto , Femenino , Hipocampo/fisiología , Humanos , Magnetoencefalografía , Masculino , Estimulación Luminosa , Recompensa , Adulto Joven
14.
Cell ; 175(3): 835-847.e25, 2018 10 18.
Artículo en Inglés | MEDLINE | ID: mdl-30340044

RESUMEN

How transcriptional bursting relates to gene regulation is a central question that has persisted for more than a decade. Here, we measure nascent transcriptional activity in early Drosophila embryos and characterize the variability in absolute activity levels across expression boundaries. We demonstrate that boundary formation follows a common transcription principle: a single control parameter determines the distribution of transcriptional activity, regardless of gene identity, boundary position, or enhancer-promoter architecture. We infer the underlying bursting kinetics and identify the key regulatory parameter as the fraction of time a gene is in a transcriptionally active state. Unexpectedly, both the rate of polymerase initiation and the switching rates are tightly constrained across all expression levels, predicting synchronous patterning outcomes at all positions in the embryo. These results point to a shared simplicity underlying the apparently complex transcriptional processes of early embryonic patterning and indicate a path to general rules in transcriptional regulation.


Asunto(s)
Tipificación del Cuerpo/genética , Regulación del Desarrollo de la Expresión Génica , Activación Transcripcional , Animales , ARN Polimerasas Dirigidas por ADN/metabolismo , Drosophila melanogaster , Embrión no Mamífero/metabolismo , Modelos Teóricos , Regiones Promotoras Genéticas
15.
Annu Rev Neurosci ; 46: 233-258, 2023 07 10.
Artículo en Inglés | MEDLINE | ID: mdl-36972611

RESUMEN

Flexible behavior requires the creation, updating, and expression of memories to depend on context. While the neural underpinnings of each of these processes have been intensively studied, recent advances in computational modeling revealed a key challenge in context-dependent learning that had been largely ignored previously: Under naturalistic conditions, context is typically uncertain, necessitating contextual inference. We review a theoretical approach to formalizing context-dependent learning in the face of contextual uncertainty and the core computations it requires. We show how this approach begins to organize a large body of disparate experimental observations, from multiple levels of brain organization (including circuits, systems, and behavior) and multiple brain regions (most prominently the prefrontal cortex, the hippocampus, and motor cortices), into a coherent framework. We argue that contextual inference may also be key to understanding continual learning in the brain. This theory-driven perspective places contextual inference as a core component of learning.


Asunto(s)
Encéfalo , Aprendizaje , Hipocampo , Corteza Prefrontal , Simulación por Computador
16.
Annu Rev Neurosci ; 44: 449-473, 2021 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-33882258

RESUMEN

Adaptive behavior in a complex, dynamic, and multisensory world poses some of the most fundamental computational challenges for the brain, notably inference, decision-making, learning, binding, and attention. We first discuss how the brain integrates sensory signals from the same source to support perceptual inference and decision-making by weighting them according to their momentary sensory uncertainties. We then show how observers solve the binding or causal inference problem-deciding whether signals come from common causes and should hence be integrated or else be treated independently. Next, we describe the multifarious interplay between multisensory processing and attention. We argue that attentional mechanisms are crucial to compute approximate solutions to the binding problem in naturalistic environments when complex time-varying signals arise from myriad causes. Finally, we review how the brain dynamically adapts multisensory processing to a changing world across multiple timescales.


Asunto(s)
Atención , Percepción Auditiva , Encéfalo , Aprendizaje , Percepción Visual
17.
Am J Hum Genet ; 111(1): 165-180, 2024 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-38181732

RESUMEN

Mendelian randomization uses genetic variants as instrumental variables to make causal inferences on the effect of an exposure on an outcome. Due to the recent abundance of high-powered genome-wide association studies, many putative causal exposures of interest have large numbers of independent genetic variants with which they associate, each representing a potential instrument for use in a Mendelian randomization analysis. Such polygenic analyses increase the power of the study design to detect causal effects; however, they also increase the potential for bias due to instrument invalidity. Recent attention has been given to dealing with bias caused by correlated pleiotropy, which results from violation of the "instrument strength independent of direct effect" assumption. Although methods have been proposed that can account for this bias, a number of restrictive conditions remain in many commonly used techniques. In this paper, we propose a Bayesian framework for Mendelian randomization that provides valid causal inference under very general settings. We propose the methods MR-Horse and MVMR-Horse, which can be performed without access to individual-level data, using only summary statistics of the type commonly published by genome-wide association studies, and can account for both correlated and uncorrelated pleiotropy. In simulation studies, we show that the approach retains type I error rates below nominal levels even in high-pleiotropy scenarios. We demonstrate the proposed approaches in applied examples in both univariable and multivariable settings, some with very weak instruments.


Asunto(s)
Estudio de Asociación del Genoma Completo , Análisis de la Aleatorización Mendeliana , Animales , Caballos , Teorema de Bayes , Simulación por Computador , Herencia Multifactorial
18.
Am J Hum Genet ; 111(8): 1717-1735, 2024 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-39059387

RESUMEN

Mendelian randomization (MR), which utilizes genetic variants as instrumental variables (IVs), has gained popularity as a method for causal inference between phenotypes using genetic data. While efforts have been made to relax IV assumptions and develop new methods for causal inference in the presence of invalid IVs due to confounding, the reliability of MR methods in real-world applications remains uncertain. Instead of using simulated datasets, we conducted a benchmark study evaluating 16 two-sample summary-level MR methods using real-world genetic datasets to provide guidelines for the best practices. Our study focused on the following crucial aspects: type I error control in the presence of various confounding scenarios (e.g., population stratification, pleiotropy, and family-level confounders like assortative mating), the accuracy of causal effect estimates, replicability, and power. By comprehensively evaluating the performance of compared methods over one thousand exposure-outcome trait pairs, our study not only provides valuable insights into the performance and limitations of the compared methods but also offers practical guidance for researchers to choose appropriate MR methods for causal inference.


Asunto(s)
Benchmarking , Estudio de Asociación del Genoma Completo , Análisis de la Aleatorización Mendeliana , Análisis de la Aleatorización Mendeliana/métodos , Humanos , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Variación Genética , Causalidad , Polimorfismo de Nucleótido Simple , Modelos Genéticos
19.
Am J Hum Genet ; 111(9): 1834-1847, 2024 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-39106865

RESUMEN

Mendelian randomization (MR) utilizes genome-wide association study (GWAS) summary data to infer causal relationships between exposures and outcomes, offering a valuable tool for identifying disease risk factors. Multivariable MR (MVMR) estimates the direct effects of multiple exposures on an outcome. This study tackles the issue of highly correlated exposures commonly observed in metabolomic data, a situation where existing MVMR methods often face reduced statistical power due to multicollinearity. We propose a robust extension of the MVMR framework that leverages constrained maximum likelihood (cML) and employs a Bayesian approach for identifying independent clusters of exposure signals. Applying our method to the UK Biobank metabolomic data for the largest Alzheimer disease (AD) cohort through a two-sample MR approach, we identified two independent signal clusters for AD: glutamine and lipids, with posterior inclusion probabilities (PIPs) of 95.0% and 81.5%, respectively. Our findings corroborate the hypothesized roles of glutamate and lipids in AD, providing quantitative support for their potential involvement.


Asunto(s)
Enfermedad de Alzheimer , Teorema de Bayes , Estudio de Asociación del Genoma Completo , Análisis de la Aleatorización Mendeliana , Metabolómica , Humanos , Enfermedad de Alzheimer/genética , Metabolómica/métodos , Polimorfismo de Nucleótido Simple , Glutamina/metabolismo , Glutamina/genética , Lípidos/sangre , Lípidos/genética
20.
Proc Natl Acad Sci U S A ; 121(15): e2322083121, 2024 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-38568975

RESUMEN

While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protein structures are used to supplement experimentally derived structures, predictions of socioeconomic indicators from satellite imagery are used to supplement accurate survey data, and so on. Since predictions are imperfect and potentially biased, this practice brings into question the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of prediction-powered inference [A. N. Angelopoulos, S. Bates, C. Fannjiang, M. I. Jordan, T. Zrnic, Science 382, 669-674 (2023)], which assumes that a good pretrained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its CIs typically have significantly lower variability.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA