Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 8.596
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Annu Rev Immunol ; 38: 727-757, 2020 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-32075461

RESUMO

Immune cells are characterized by diversity, specificity, plasticity, and adaptability-properties that enable them to contribute to homeostasis and respond specifically and dynamically to the many threats encountered by the body. Single-cell technologies, including the assessment of transcriptomics, genomics, and proteomics at the level of individual cells, are ideally suited to studying these properties of immune cells. In this review we discuss the benefits of adopting single-cell approaches in studying underappreciated qualities of immune cells and highlight examples where these technologies have been critical to advancing our understanding of the immune system in health and disease.


Assuntos
Sistema Imunitário/imunologia , Sistema Imunitário/metabolismo , Imunidade , Análise de Célula Única , Animais , Biomarcadores , Suscetibilidade a Doenças , Homeostase , Humanos , Sistema Imunitário/citologia , Imagem Molecular , Análise de Célula Única/métodos
2.
Cell ; 186(22): 4885-4897.e14, 2023 10 26.
Artigo em Inglês | MEDLINE | ID: mdl-37804832

RESUMO

Human reasoning depends on reusing pieces of information by putting them together in new ways. However, very little is known about how compositional computation is implemented in the brain. Here, we ask participants to solve a series of problems that each require constructing a whole from a set of elements. With fMRI, we find that representations of novel constructed objects in the frontal cortex and hippocampus are relational and compositional. With MEG, we find that replay assembles elements into compounds, with each replay sequence constituting a hypothesis about a possible configuration of elements. The content of sequences evolves as participants solve each puzzle, progressing from predictable to uncertain elements and gradually converging on the correct configuration. Together, these results suggest a computational bridge between apparently distinct functions of hippocampal-prefrontal circuitry and a role for generative replay in compositional inference and hypothesis testing.


Assuntos
Hipocampo , Córtex Pré-Frontal , Humanos , Encéfalo , Lobo Frontal , Hipocampo/fisiologia , Imageamento por Ressonância Magnética/métodos , Vias Neurais , Córtex Pré-Frontal/fisiologia
3.
Cell ; 186(3): 497-512.e23, 2023 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-36657443

RESUMO

The human embryo breaks symmetry to form the anterior-posterior axis of the body. As the embryo elongates along this axis, progenitors in the tail bud give rise to tissues that generate spinal cord, skeleton, and musculature. This raises the question of how the embryo achieves axial elongation and patterning. While ethics necessitate in vitro studies, the variability of organoid systems has hindered mechanistic insights. Here, we developed a bioengineering and machine learning framework that optimizes organoid symmetry breaking by tuning their spatial coupling. This framework enabled reproducible generation of axially elongating organoids, each possessing a tail bud and neural tube. We discovered that an excitable system composed of WNT/FGF signaling drives elongation by inducing a neuromesodermal progenitor-like signaling center. We discovered that instabilities in the excitable system are suppressed by secreted WNT inhibitors. Absence of these inhibitors led to ectopic tail buds and branches. Our results identify mechanisms governing stable human axial elongation.


Assuntos
Padronização Corporal , Mesoderma , Humanos , Via de Sinalização Wnt , Embrião de Mamíferos , Organoides
4.
Cell ; 185(25): 4703-4716.e16, 2022 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-36455558

RESUMO

We report genome-wide data from 33 Ashkenazi Jews (AJ), dated to the 14th century, obtained following a salvage excavation at the medieval Jewish cemetery of Erfurt, Germany. The Erfurt individuals are genetically similar to modern AJ, but they show more variability in Eastern European-related ancestry than modern AJ. A third of the Erfurt individuals carried a mitochondrial lineage common in modern AJ and eight carried pathogenic variants known to affect AJ today. These observations, together with high levels of runs of homozygosity, suggest that the Erfurt community had already experienced the major reduction in size that affected modern AJ. The Erfurt bottleneck was more severe, implying substructure in medieval AJ. Overall, our results suggest that the AJ founder event and the acquisition of the main sources of ancestry pre-dated the 14th century and highlight late medieval genetic heterogeneity no longer present in modern AJ.


Assuntos
Judeus , População Branca , Humanos , Judeus/genética , Genética Populacional , Genoma Humano
5.
Cell ; 185(10): 1646-1660.e18, 2022 05 12.
Artigo em Inglês | MEDLINE | ID: mdl-35447073

RESUMO

Incomplete lineage sorting (ILS) makes ancestral genetic polymorphisms persist during rapid speciation events, inducing incongruences between gene trees and species trees. ILS has complicated phylogenetic inference in many lineages, including hominids. However, we lack empirical evidence that ILS leads to incongruent phenotypic variation. Here, we performed phylogenomic analyses to show that the South American monito del monte is the sister lineage of all Australian marsupials, although over 31% of its genome is closer to the Diprotodontia than to other Australian groups due to ILS during ancient radiation. Pervasive conflicting phylogenetic signals across the whole genome are consistent with some of the morphological variation among extant marsupials. We detected hundreds of genes that experienced stochastic fixation during ILS, encoding the same amino acids in non-sister species. Using functional experiments, we confirm how ILS may have directly contributed to hemiplasy in morphological traits that were established during rapid marsupial speciation ca. 60 mya.


Assuntos
Marsupiais , Animais , Austrália , Evolução Molecular , Especiação Genética , Genoma , Marsupiais/genética , Fenótipo , Filogenia
6.
Cell ; 185(11): 1842-1859.e18, 2022 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-35561686

RESUMO

The precise genetic origins of the first Neolithic farming populations in Europe and Southwest Asia, as well as the processes and the timing of their differentiation, remain largely unknown. Demogenomic modeling of high-quality ancient genomes reveals that the early farmers of Anatolia and Europe emerged from a multiphase mixing of a Southwest Asian population with a strongly bottlenecked western hunter-gatherer population after the last glacial maximum. Moreover, the ancestors of the first farmers of Europe and Anatolia went through a period of extreme genetic drift during their westward range expansion, contributing highly to their genetic distinctiveness. This modeling elucidates the demographic processes at the root of the Neolithic transition and leads to a spatial interpretation of the population history of Southwest Asia and Europe during the late Pleistocene and early Holocene.


Assuntos
Fazendeiros , Genoma , Agricultura , DNA Mitocondrial/genética , Europa (Continente) , Deriva Genética , Genômica , História Antiga , Migração Humana , Humanos
7.
Cell ; 185(24): 4604-4620.e32, 2022 11 23.
Artigo em Inglês | MEDLINE | ID: mdl-36423582

RESUMO

Natural and induced somatic mutations that accumulate in the genome during development record the phylogenetic relationships of cells; whether these lineage barcodes capture the complex dynamics of progenitor states remains unclear. We introduce quantitative fate mapping, an approach to reconstruct the hierarchy, commitment times, population sizes, and commitment biases of intermediate progenitor states during development based on a time-scaled phylogeny of their descendants. To reconstruct time-scaled phylogenies from lineage barcodes, we introduce Phylotime, a scalable maximum likelihood clustering approach based on a general barcoding mutagenesis model. We validate these approaches using realistic in silico and in vitro barcoding experiments. We further establish criteria for the number of cells that must be analyzed for robust quantitative fate mapping and a progenitor state coverage statistic to assess the robustness. This work demonstrates how lineage barcodes, natural or synthetic, enable analyzing progenitor fate and dynamics long after embryonic development in any organism.


Assuntos
Desenvolvimento Embrionário , Linhagem da Célula/genética , Estudos Retrospectivos , Filogenia , Mutagênese
8.
Cell ; 184(16): 4315-4328.e17, 2021 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-34197734

RESUMO

An ability to build structured mental maps of the world underpins our capacity to imagine relationships between objects that extend beyond experience. In rodents, such representations are supported by sequential place cell reactivations during rest, known as replay. Schizophrenia is proposed to reflect a compromise in structured mental representations, with animal models reporting abnormalities in hippocampal replay and associated ripple activity during rest. Here, utilizing magnetoencephalography (MEG), we tasked patients with schizophrenia and control participants to infer unobserved relationships between objects by reorganizing visual experiences containing these objects. During a post-task rest session, controls exhibited fast spontaneous neural reactivation of presented objects that replayed inferred relationships. Replay was coincident with increased ripple power in hippocampus. Patients showed both reduced replay and augmented ripple power relative to controls, convergent with findings in animal models. These abnormalities are linked to impairments in behavioral acquisition and subsequent neural representation of task structure.


Assuntos
Aprendizagem , Neurônios/patologia , Esquizofrenia/patologia , Esquizofrenia/fisiopatologia , Ritmo alfa/fisiologia , Comportamento , Mapeamento Encefálico , Feminino , Hipocampo/fisiopatologia , Humanos , Magnetoencefalografia , Masculino , Modelos Biológicos , Análise e Desempenho de Tarefas
9.
Cell ; 184(11): 2825-2842.e22, 2021 05 27.
Artigo em Inglês | MEDLINE | ID: mdl-33932341

RESUMO

Mouse embryonic development is a canonical model system for studying mammalian cell fate acquisition. Recently, single-cell atlases comprehensively charted embryonic transcriptional landscapes, yet inference of the coordinated dynamics of cells over such atlases remains challenging. Here, we introduce a temporal model for mouse gastrulation, consisting of data from 153 individually sampled embryos spanning 36 h of molecular diversification. Using algorithms and precise timing, we infer differentiation flows and lineage specification dynamics over the embryonic transcriptional manifold. Rapid transcriptional bifurcations characterize the commitment of early specialized node and blood cells. However, for most lineages, we observe combinatorial multi-furcation dynamics rather than hierarchical transcriptional transitions. In the mesoderm, dozens of transcription factors combinatorially regulate multifurcations, as we exemplify using time-matched chimeric embryos of Foxc1/Foxc2 mutants. Our study rejects the notion of differentiation being governed by a series of binary choices, providing an alternative quantitative model for cell fate acquisition.


Assuntos
Desenvolvimento Embrionário/fisiologia , Gastrulação/fisiologia , Animais , Diferenciação Celular , Linhagem da Célula , Embrião de Mamíferos/citologia , Desenvolvimento Embrionário/genética , Feminino , Expressão Gênica , Camundongos/embriologia , Camundongos Endogâmicos C57BL , Células-Tronco Embrionárias Murinas , Gravidez , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
10.
Cell ; 184(11): 2988-3005.e16, 2021 05 27.
Artigo em Inglês | MEDLINE | ID: mdl-34019793

RESUMO

Clear cell renal carcinoma (ccRCC) is a heterogeneous disease with a variable post-surgical course. To assemble a comprehensive ccRCC tumor microenvironment (TME) atlas, we performed single-cell RNA sequencing (scRNA-seq) of hematopoietic and non-hematopoietic subpopulations from tumor and tumor-adjacent tissue of treatment-naive ccRCC resections. We leveraged the VIPER algorithm to quantitate single-cell protein activity and validated this approach by comparison to flow cytometry. The analysis identified key TME subpopulations, as well as their master regulators and candidate cell-cell interactions, revealing clinically relevant populations, undetectable by gene-expression analysis. Specifically, we uncovered a tumor-specific macrophage subpopulation characterized by upregulation of TREM2/APOE/C1Q, validated by spatially resolved, quantitative multispectral immunofluorescence. In a large clinical validation cohort, these markers were significantly enriched in tumors from patients who recurred following surgery. The study thus identifies TREM2/APOE/C1Q-positive macrophage infiltration as a potential prognostic biomarker for ccRCC recurrence, as well as a candidate therapeutic target.


Assuntos
Carcinoma de Células Renais/metabolismo , Recidiva Local de Neoplasia/genética , Macrófagos Associados a Tumor/metabolismo , Adulto , Apolipoproteínas E/genética , Apolipoproteínas E/metabolismo , Biomarcadores Tumorais/genética , Carcinoma de Células Renais/genética , Carcinoma de Células Renais/patologia , Estudos de Coortes , Feminino , Expressão Gênica/genética , Regulação Neoplásica da Expressão Gênica/genética , Humanos , Rim/metabolismo , Neoplasias Renais/patologia , Linfócitos do Interstício Tumoral/patologia , Macrófagos/metabolismo , Masculino , Glicoproteínas de Membrana/genética , Glicoproteínas de Membrana/metabolismo , Pessoa de Meia-Idade , Recidiva Local de Neoplasia/metabolismo , Prognóstico , Receptores de Complemento/genética , Receptores de Complemento/metabolismo , Receptores Imunológicos/genética , Receptores Imunológicos/metabolismo , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Microambiente Tumoral , Macrófagos Associados a Tumor/fisiologia
11.
Cell ; 183(1): 228-243.e21, 2020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-32946810

RESUMO

Every day we make decisions critical for adaptation and survival. We repeat actions with known consequences. But we also draw on loosely related events to infer and imagine the outcome of entirely novel choices. These inferential decisions are thought to engage a number of brain regions; however, the underlying neuronal computation remains unknown. Here, we use a multi-day cross-species approach in humans and mice to report the functional anatomy and neuronal computation underlying inferential decisions. We show that during successful inference, the mammalian brain uses a hippocampal prospective code to forecast temporally structured learned associations. Moreover, during resting behavior, coactivation of hippocampal cells in sharp-wave/ripples represent inferred relationships that include reward, thereby "joining-the-dots" between events that have not been observed together but lead to profitable outcomes. Computing mnemonic links in this manner may provide an important mechanism to build a cognitive map that stretches beyond direct experience, thus supporting flexible behavior.


Assuntos
Tomada de Decisões/fisiologia , Rede Nervosa/fisiologia , Pensamento/fisiologia , Animais , Encéfalo/fisiologia , Feminino , Hipocampo/metabolismo , Hipocampo/fisiologia , Humanos , Masculino , Memória/fisiologia , Camundongos , Camundongos Endogâmicos C57BL , Modelos Neurológicos , Neurônios/metabolismo , Neurônios/fisiologia , Estudos Prospectivos , Adulto Jovem
12.
Cell ; 181(5): 1146-1157.e11, 2020 05 28.
Artigo em Inglês | MEDLINE | ID: mdl-32470400

RESUMO

We report genome-wide DNA data for 73 individuals from five archaeological sites across the Bronze and Iron Ages Southern Levant. These individuals, who share the "Canaanite" material culture, can be modeled as descending from two sources: (1) earlier local Neolithic populations and (2) populations related to the Chalcolithic Zagros or the Bronze Age Caucasus. The non-local contribution increased over time, as evinced by three outliers who can be modeled as descendants of recent migrants. We show evidence that different "Canaanite" groups genetically resemble each other more than other populations. We find that Levant-related modern populations typically have substantial ancestry coming from populations related to the Chalcolithic Zagros and the Bronze Age Southern Levant. These groups also harbor ancestry from sources we cannot fully model with the available data, highlighting the critical role of post-Bronze-Age migrations into the region over the past 3,000 years.


Assuntos
DNA Antigo/análise , Etnicidade/genética , Fluxo Gênico/genética , Arqueologia/métodos , DNA Mitocondrial/genética , Etnicidade/história , Fluxo Gênico/fisiologia , Variação Genética/genética , Genética Populacional/métodos , Genoma Humano/genética , Genômica/métodos , Haplótipos , História Antiga , Migração Humana/história , Humanos , Região do Mediterrâneo , Oriente Médio , Análise de Sequência de DNA
13.
Cell ; 178(3): 640-652.e14, 2019 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-31280961

RESUMO

Knowledge abstracted from previous experiences can be transferred to aid new learning. Here, we asked whether such abstract knowledge immediately guides the replay of new experiences. We first trained participants on a rule defining an ordering of objects and then presented a novel set of objects in a scrambled order. Across two studies, we observed that representations of these novel objects were reactivated during a subsequent rest. As in rodents, human "replay" events occurred in sequences accelerated in time, compared to actual experience, and reversed their direction after a reward. Notably, replay did not simply recapitulate visual experience, but followed instead a sequence implied by learned abstract knowledge. Furthermore, each replay contained more than sensory representations of the relevant objects. A sensory code of object representations was preceded 50 ms by a code factorized into sequence position and sequence identity. We argue that this factorized representation facilitates the generalization of a previously learned structure to new objects.


Assuntos
Aprendizagem , Memória , Potenciais de Ação , Adulto , Feminino , Hipocampo/fisiologia , Humanos , Magnetoencefalografia , Masculino , Estimulação Luminosa , Recompensa , Adulto Jovem
14.
Cell ; 175(3): 835-847.e25, 2018 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-30340044

RESUMO

How transcriptional bursting relates to gene regulation is a central question that has persisted for more than a decade. Here, we measure nascent transcriptional activity in early Drosophila embryos and characterize the variability in absolute activity levels across expression boundaries. We demonstrate that boundary formation follows a common transcription principle: a single control parameter determines the distribution of transcriptional activity, regardless of gene identity, boundary position, or enhancer-promoter architecture. We infer the underlying bursting kinetics and identify the key regulatory parameter as the fraction of time a gene is in a transcriptionally active state. Unexpectedly, both the rate of polymerase initiation and the switching rates are tightly constrained across all expression levels, predicting synchronous patterning outcomes at all positions in the embryo. These results point to a shared simplicity underlying the apparently complex transcriptional processes of early embryonic patterning and indicate a path to general rules in transcriptional regulation.


Assuntos
Padronização Corporal/genética , Regulação da Expressão Gênica no Desenvolvimento , Ativação Transcricional , Animais , RNA Polimerases Dirigidas por DNA/metabolismo , Drosophila melanogaster , Embrião não Mamífero/metabolismo , Modelos Teóricos , Regiões Promotoras Genéticas
15.
Annu Rev Neurosci ; 46: 233-258, 2023 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-36972611

RESUMO

Flexible behavior requires the creation, updating, and expression of memories to depend on context. While the neural underpinnings of each of these processes have been intensively studied, recent advances in computational modeling revealed a key challenge in context-dependent learning that had been largely ignored previously: Under naturalistic conditions, context is typically uncertain, necessitating contextual inference. We review a theoretical approach to formalizing context-dependent learning in the face of contextual uncertainty and the core computations it requires. We show how this approach begins to organize a large body of disparate experimental observations, from multiple levels of brain organization (including circuits, systems, and behavior) and multiple brain regions (most prominently the prefrontal cortex, the hippocampus, and motor cortices), into a coherent framework. We argue that contextual inference may also be key to understanding continual learning in the brain. This theory-driven perspective places contextual inference as a core component of learning.


Assuntos
Encéfalo , Aprendizagem , Hipocampo , Córtex Pré-Frontal , Simulação por Computador
16.
Annu Rev Neurosci ; 44: 449-473, 2021 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-33882258

RESUMO

Adaptive behavior in a complex, dynamic, and multisensory world poses some of the most fundamental computational challenges for the brain, notably inference, decision-making, learning, binding, and attention. We first discuss how the brain integrates sensory signals from the same source to support perceptual inference and decision-making by weighting them according to their momentary sensory uncertainties. We then show how observers solve the binding or causal inference problem-deciding whether signals come from common causes and should hence be integrated or else be treated independently. Next, we describe the multifarious interplay between multisensory processing and attention. We argue that attentional mechanisms are crucial to compute approximate solutions to the binding problem in naturalistic environments when complex time-varying signals arise from myriad causes. Finally, we review how the brain dynamically adapts multisensory processing to a changing world across multiple timescales.


Assuntos
Atenção , Percepção Auditiva , Encéfalo , Aprendizagem , Percepção Visual
17.
Am J Hum Genet ; 111(1): 165-180, 2024 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-38181732

RESUMO

Mendelian randomization uses genetic variants as instrumental variables to make causal inferences on the effect of an exposure on an outcome. Due to the recent abundance of high-powered genome-wide association studies, many putative causal exposures of interest have large numbers of independent genetic variants with which they associate, each representing a potential instrument for use in a Mendelian randomization analysis. Such polygenic analyses increase the power of the study design to detect causal effects; however, they also increase the potential for bias due to instrument invalidity. Recent attention has been given to dealing with bias caused by correlated pleiotropy, which results from violation of the "instrument strength independent of direct effect" assumption. Although methods have been proposed that can account for this bias, a number of restrictive conditions remain in many commonly used techniques. In this paper, we propose a Bayesian framework for Mendelian randomization that provides valid causal inference under very general settings. We propose the methods MR-Horse and MVMR-Horse, which can be performed without access to individual-level data, using only summary statistics of the type commonly published by genome-wide association studies, and can account for both correlated and uncorrelated pleiotropy. In simulation studies, we show that the approach retains type I error rates below nominal levels even in high-pleiotropy scenarios. We demonstrate the proposed approaches in applied examples in both univariable and multivariable settings, some with very weak instruments.


Assuntos
Estudo de Associação Genômica Ampla , Análise da Randomização Mendeliana , Animais , Cavalos , Teorema de Bayes , Simulação por Computador , Herança Multifatorial
18.
Am J Hum Genet ; 111(8): 1717-1735, 2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-39059387

RESUMO

Mendelian randomization (MR), which utilizes genetic variants as instrumental variables (IVs), has gained popularity as a method for causal inference between phenotypes using genetic data. While efforts have been made to relax IV assumptions and develop new methods for causal inference in the presence of invalid IVs due to confounding, the reliability of MR methods in real-world applications remains uncertain. Instead of using simulated datasets, we conducted a benchmark study evaluating 16 two-sample summary-level MR methods using real-world genetic datasets to provide guidelines for the best practices. Our study focused on the following crucial aspects: type I error control in the presence of various confounding scenarios (e.g., population stratification, pleiotropy, and family-level confounders like assortative mating), the accuracy of causal effect estimates, replicability, and power. By comprehensively evaluating the performance of compared methods over one thousand exposure-outcome trait pairs, our study not only provides valuable insights into the performance and limitations of the compared methods but also offers practical guidance for researchers to choose appropriate MR methods for causal inference.


Assuntos
Benchmarking , Estudo de Associação Genômica Ampla , Análise da Randomização Mendeliana , Análise da Randomização Mendeliana/métodos , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Variação Genética , Causalidade , Polimorfismo de Nucleotídeo Único , Modelos Genéticos
19.
Am J Hum Genet ; 111(9): 1834-1847, 2024 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-39106865

RESUMO

Mendelian randomization (MR) utilizes genome-wide association study (GWAS) summary data to infer causal relationships between exposures and outcomes, offering a valuable tool for identifying disease risk factors. Multivariable MR (MVMR) estimates the direct effects of multiple exposures on an outcome. This study tackles the issue of highly correlated exposures commonly observed in metabolomic data, a situation where existing MVMR methods often face reduced statistical power due to multicollinearity. We propose a robust extension of the MVMR framework that leverages constrained maximum likelihood (cML) and employs a Bayesian approach for identifying independent clusters of exposure signals. Applying our method to the UK Biobank metabolomic data for the largest Alzheimer disease (AD) cohort through a two-sample MR approach, we identified two independent signal clusters for AD: glutamine and lipids, with posterior inclusion probabilities (PIPs) of 95.0% and 81.5%, respectively. Our findings corroborate the hypothesized roles of glutamate and lipids in AD, providing quantitative support for their potential involvement.


Assuntos
Doença de Alzheimer , Teorema de Bayes , Estudo de Associação Genômica Ampla , Análise da Randomização Mendeliana , Metabolômica , Humanos , Doença de Alzheimer/genética , Metabolômica/métodos , Polimorfismo de Nucleotídeo Único , Glutamina/metabolismo , Glutamina/genética , Lipídeos/sangue , Lipídeos/genética
20.
Proc Natl Acad Sci U S A ; 121(15): e2322083121, 2024 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-38568975

RESUMO

While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protein structures are used to supplement experimentally derived structures, predictions of socioeconomic indicators from satellite imagery are used to supplement accurate survey data, and so on. Since predictions are imperfect and potentially biased, this practice brings into question the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of prediction-powered inference [A. N. Angelopoulos, S. Bates, C. Fannjiang, M. I. Jordan, T. Zrnic, Science 382, 669-674 (2023)], which assumes that a good pretrained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its CIs typically have significantly lower variability.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA