Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 121(24): e2318124121, 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38830100

RESUMEN

There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs; this is insufficient for making an informed decision about which LLMs are best to use in an interactive setting, and how that varies by setting. Static assessment therefore limits how we understand language model capabilities. We introduce CheckMate, an adaptable prototype platform for humans to interact with and evaluate LLMs. We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics, with a mixed cohort of participants from undergraduate students to professors of mathematics. We release the resulting interaction and rating dataset, MathConverse. By analyzing MathConverse, we derive a taxonomy of human query behaviors and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness in LLM generations, among other findings. Further, we garner a more granular understanding of GPT-4 mathematical problem-solving through a series of case studies, contributed by experienced mathematicians. We conclude with actionable takeaways for ML practitioners and mathematicians: models that communicate uncertainty, respond well to user corrections, and can provide a concise rationale for their recommendations, may constitute better assistants. Humans should inspect LLM output carefully given their current shortcomings and potential for surprising fallibility.


Asunto(s)
Lenguaje , Matemática , Solución de Problemas , Humanos , Solución de Problemas/fisiología , Estudiantes/psicología
2.
Patterns (N Y) ; 4(7): 100780, 2023 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-37521050

RESUMEN

Machine learning (ML) practitioners are increasingly tasked with developing models that are aligned with non-technical experts' values and goals. However, there has been insufficient consideration of how practitioners should translate domain expertise into ML updates. In this review, we consider how to capture interactions between practitioners and experts systematically. We devise a taxonomy to match expert feedback types with practitioner updates. A practitioner may receive feedback from an expert at the observation or domain level and then convert this feedback into updates to the dataset, loss function, or parameter space. We review existing work from ML and human-computer interaction to describe this feedback-update taxonomy and highlight the insufficient consideration given to incorporating feedback from non-technical experts. We end with a set of open questions that naturally arise from our proposed taxonomy and subsequent survey.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 5139-5157, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-35939468

RESUMEN

Belief propagation (BP) is a popular method for performing probabilistic inference on graphical models. In this work, we enhance BP and propose self-guided belief propagation (SBP) that incorporates the pairwise potentials only gradually. This homotopy continuation method converges to a unique solution and increases the accuracy without increasing the computational burden. We provide a formal analysis to demonstrate that SBP finds the global optimum of the Bethe approximation for attractive models where all variables favor the same state. Moreover, we apply SBP to various graphs with random potentials and empirically show that: (i) SBP is superior in terms of accuracy whenever BP converges, and (ii) SBP obtains a unique, stable, and accurate solution whenever BP does not converge.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2458-2474, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-35294343

RESUMEN

This paper addresses the deep face recognition problem under an open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. To this end, hyperspherical face recognition, as a promising line of research, has attracted increasing attention and gradually become a major focus in face recognition research. As one of the earliest works in hyperspherical face recognition, SphereFace explicitly proposed to learn face embeddings with large inter-class angular margin. However, SphereFace still suffers from severe training instability which limits its application in practice. In order to address this problem, we introduce a unified framework to understand large angular margin in hyperspherical face recognition. Under this framework, we extend the study of SphereFace and propose an improved variant with substantially better training stability - SphereFace-R. Specifically, we propose two novel ways to implement the multiplicative margin, and study SphereFace-R under three different feature normalization schemes (no feature normalization, hard feature normalization and soft feature normalization). We also propose an implementation strategy - "characteristic gradient detachment" - to stabilize training. Extensive experiments on SphereFace-R show that it is consistently better than or competitive with state-of-the-art methods.

5.
Patterns (N Y) ; 3(4): 100455, 2022 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-35465233

RESUMEN

The study of human-machine systems is central to a variety of behavioral and engineering disciplines, including management science, human factors, robotics, and human-computer interaction. Recent advances in artificial intelligence (AI) and machine learning have brought the study of human-AI teams into sharper focus. An important set of questions for those designing human-AI interfaces concerns trust, transparency, and error tolerance. Here, we review the emerging literature on this important topic, identify open questions, and discuss some of the pitfalls of human-AI team research. We present opposition (extreme algorithm aversion or distrust) and loafing (extreme automation complacency or bias) as lying at opposite ends of a spectrum, with algorithmic vigilance representing an ideal mid-point. We suggest that, while transparency may be crucial for facilitating appropriate levels of trust in AI and thus for counteracting aversive behaviors and promoting vigilance, transparency should not be conceived solely in terms of the explainability of an algorithm. Dynamic task allocation, as well as the communication of confidence and performance metrics-among other strategies-may ultimately prove more useful to users than explanations from algorithms and significantly more effective in promoting vigilance. We further suggest that, while both aversive and appreciative attitudes are detrimental to optimal human-AI team performance, strategies to curb aversion are likely to be more important in the longer term than those attempting to mitigate appreciation. Our wider aim is to channel disparate efforts in human-AI team research into a common framework and to draw attention to the ecological validity of results in this field.

8.
Science ; 374(6573): 1327-1329, 2021 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-34882478

RESUMEN

Incident sharing, auditing, and other concrete mechanisms could help verify the trustworthiness of actors.

9.
ArXiv ; 2021 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-34815983

RESUMEN

Artificial intelligence (AI) provides a promising substitution for streamlining COVID-19 diagnoses. However, concerns surrounding security and trustworthiness impede the collection of large-scale representative medical data, posing a considerable challenge for training a well-generalised model in clinical practices. To address this, we launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the AI model can be distributedly trained and independently executed at each host institution under a federated learning framework (FL) without data sharing. Here we show that our FL model outperformed all the local models by a large yield (test sensitivity /specificity in China: 0.973/0.951, in the UK: 0.730/0.942), achieving comparable performance with a panel of professional radiologists. We further evaluated the model on the hold-out (collected from another two hospitals leaving out the FL) and heterogeneous (acquired with contrast materials) data, provided visual explanations for decisions made by the model, and analysed the trade-offs between the model performance and the communication costs in the federated training process. Our study is based on 9,573 chest computed tomography scans (CTs) from 3,336 patients collected from 23 hospitals located in China and the UK. Collectively, our work advanced the prospects of utilising federated learning for privacy-preserving AI in digital health.

10.
Nat Mach Intell ; 3(12): 1081-1089, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38264185

RESUMEN

Artificial intelligence provides a promising solution for streamlining COVID-19 diagnoses; however, concerns surrounding security and trustworthiness impede the collection of large-scale representative medical data, posing a considerable challenge for training a well-generalized model in clinical practices. To address this, we launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the artificial intelligence (AI) model can be distributedly trained and independently executed at each host institution under a federated learning framework without data sharing. Here we show that our federated learning framework model considerably outperformed all of the local models (with a test sensitivity/specificity of 0.973/0.951 in China and 0.730/0.942 in the United Kingdom), achieving comparable performance with a panel of professional radiologists. We further evaluated the model on the hold-out (collected from another two hospitals without the federated learning framework) and heterogeneous (acquired with contrast materials) data, provided visual explanations for decisions made by the model, and analysed the trade-offs between the model performance and the communication costs in the federated training process. Our study is based on 9,573 chest computed tomography scans from 3,336 patients collected from 23 hospitals located in China and the United Kingdom. Collectively, our work advanced the prospects of utilizing federated learning for privacy-preserving AI in digital health.

11.
Science ; 368(6498): 1433-1434, 2020 06 26.
Artículo en Inglés | MEDLINE | ID: mdl-32587011
12.
IEEE Trans Artif Intell ; 1(1): 85-103, 2020 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37982070

RESUMEN

COVID-19, an infectious disease caused by the SARS-CoV-2 virus, was declared a pandemic by the World Health Organisation (WHO) in March 2020. By mid-August 2020, more than 21 million people have tested positive worldwide. Infections have been growing rapidly and tremendous efforts are being made to fight the disease. In this paper, we attempt to systematise the various COVID-19 research activities leveraging data science, where we define data science broadly to encompass the various methods and tools-including those from artificial intelligence (AI), machine learning (ML), statistics, modeling, simulation, and data visualization-that can be used to store, process, and extract insights from data. In addition to reviewing the rapidly growing body of recent research, we survey public datasets and repositories that can be used for further work to track COVID-19 spread and mitigation strategies. As part of this, we present a bibliometric analysis of the papers produced in this short span of time. Finally, building on these insights, we highlight common challenges and pitfalls observed across the surveyed works. We also created a live resource repository at https://github.com/Data-Science-and-COVID-19/Leveraging-Data-Science-To-Combat-COVID-19-A-Comprehensive-Review that we intend to keep updated with the latest resources including new papers and datasets.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA