Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Quilt-1M: One Million Image-Text Pairs for Histopathology.

Ikezogwo, Wisdom O; Seyfioglu, Mehmet S; Ghezloo, Fatemeh; Geva, Dylan; Mohammed, Fatwir S; Anand, Pavan K; Krishna, Ranjay; Shapiro, Linda G.

Adv Neural Inf Process Syst ; 36(DB1): 37995-38017, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38742142

RESUMO

Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of analogous data in the medical field, specifically in histopathology, has halted comparable progress. To enable similar representation learning for histopathology, we turn to YouTube, an untapped resource of videos, offering 1,087 hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of 768,826 image and text pairs. Quilt was automatically curated using a mixture of models, including large language models, handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around 200K samples. We combine Quilt with datasets from other sources, including Twitter, research papers, and the internet in general, to create an even larger dataset: Quilt-1M, with 1M paired image-text samples, marking it as the largest vision-language histopathology dataset to date. We demonstrate the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model outperforms state-of-the-art models on both zero-shot and linear probing tasks for classifying new histopathology images across 13 diverse patch-level datasets of 8 different sub-pathologies and cross-modal retrieval tasks.

Socially situated artificial intelligence enables learning from human interaction.

Krishna, Ranjay; Lee, Donsuk; Fei-Fei, Li; Bernstein, Michael S.

Proc Natl Acad Sci U S A ; 119(39): e2115730119, 2022 09 27.

Artigo em Inglês | MEDLINE | ID: mdl-36122244

RESUMO

Regardless of how much data artificial intelligence agents have available, agents will inevitably encounter previously unseen situations in real-world deployments. Reacting to novel situations by acquiring new information from other people-socially situated learning-is a core faculty of human development. Unfortunately, socially situated learning remains an open challenge for artificial intelligence agents because they must learn how to interact with people to seek out the information that they lack. In this article, we formalize the task of socially situated artificial intelligence-agents that seek out new information through social interactions with people-as a reinforcement learning problem where the agent learns to identify meaningful and informative questions via rewards observed through social interaction. We manifest our framework as an interactive agent that learns how to ask natural language questions about photos as it broadens its visual intelligence on a large photo-sharing social network. Unlike active-learning methods, which implicitly assume that humans are oracles willing to answer any question, our agent adapts its behavior based on observed norms of which questions people are or are not interested to answer. Through an 8-mo deployment where our agent interacted with 236,000 social media users, our agent improved its performance at recognizing new visual information by 112%. A controlled field experiment confirmed that our agent outperformed an active-learning baseline by 25.6%. This work advances opportunities for continuously improving artificial intelligence (AI) agents that better respect norms in open social environments.

Assuntos

Inteligência Artificial , Reforço Psicológico , Interação Social , Humanos , Recompensa , Normas Sociais

Scene Graph Prediction with Limited Labels.

Chen, Vincent S; Varma, Paroma; Krishna, Ranjay; Bernstein, Michael; Ré, Christopher; Fei-Fei, Li.

Proc IEEE Int Conf Comput Vis ; 2019: 2580-2590, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-32218709

RESUMO

Visual knowledge bases such as Visual Genome power numerous applications in computer vision, including visual question answering and captioning, but suffer from sparse, incomplete relationships. All scene graph models to date are limited to training on a small set of visual relationships that have thousands of training labels each. Hiring human annotators is expensive, and using textual knowledge base completion methods are incompatible with visual data. In this paper, we introduce a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few' labeled examples. We analyze visual relationships to suggest two types of image-agnostic features that are used to generate noisy heuristics, whose outputs are aggregated using a factor graph-based generative model. With as few as 10 labeled examples per relationship, the generative model creates enough training data to train any existing state-of-the-art scene graph model. We demonstrate that our method outperforms all baseline approaches on scene graph prediction by 5.16 recall@ 100 for PREDCLS. In our limited label setting, we define a complexity metric for relationships that serves as an indicator (R2 = 0.778) for conditions under which our method succeeds over transfer learning, the de-facto approach for training with limited labels.

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA