Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 1.864
Filter
Add more filters

Publication year range
1.
Trends Biochem Sci ; 48(12): 1014-1018, 2023 12.
Article in English | MEDLINE | ID: mdl-37833131

ABSTRACT

Generative artificial intelligence (AI) is a burgeoning field with widespread applications, including in science. Here, we explore two paradigms that provide insight into the capabilities and limitations of Chat Generative Pre-trained Transformer (ChatGPT): its ability to (i) define a core biological concept (the Central Dogma of molecular biology); and (ii) interpret the genetic code.


Subject(s)
Artificial Intelligence , Genetic Code , Molecular Biology
2.
Proc Natl Acad Sci U S A ; 120(49): e2309350120, 2023 Dec 05.
Article in English | MEDLINE | ID: mdl-38032930

ABSTRACT

The ability of recent Large Language Models (LLMs) such as GPT-3.5 and GPT-4 to generate human-like texts suggests that social scientists could use these LLMs to construct measures of semantic similarity that match human judgment. In this article, we provide an empirical test of this intuition. We use GPT-4 to construct a measure of typicality-the similarity of a text document to a concept. We evaluate its performance against other model-based typicality measures in terms of the correlation with human typicality ratings. We conduct this comparative analysis in two domains: the typicality of books in literary genres (using an existing dataset of book descriptions) and the typicality of tweets authored by US Congress members in the Democratic and Republican parties (using a novel dataset). The typicality measure produced with GPT-4 meets or exceeds the performance of the previous state-of-the art typicality measure we introduced in a recent paper [G. Le Mens, B. Kovács, M. T. Hannan, G. Pros Rius, Sociol. Sci. 2023, 82-117 (2023)]. It accomplishes this without any training with the research data (it is zero-shot learning). This is a breakthrough because the previous state-of-the-art measure required fine-tuning an LLM on hundreds of thousands of text documents to achieve its performance.

3.
Proc Natl Acad Sci U S A ; 120(30): e2305016120, 2023 Jul 25.
Article in English | MEDLINE | ID: mdl-37463210

ABSTRACT

Many NLP applications require manual text annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by crowd workers on platforms such as MTurk as well as trained annotators, such as research assistants. Using four samples of tweets and news articles (n = 6,183), we show that ChatGPT outperforms crowd workers for several annotation tasks, including relevance, stance, topics, and frame detection. Across the four datasets, the zero-shot accuracy of ChatGPT exceeds that of crowd workers by about 25 percentage points on average, while ChatGPT's intercoder agreement exceeds that of both crowd workers and trained annotators for all tasks. Moreover, the per-annotation cost of ChatGPT is less than $0.003-about thirty times cheaper than MTurk. These results demonstrate the potential of large language models to drastically increase the efficiency of text classification.

4.
Proc Natl Acad Sci U S A ; 120(44): e2313790120, 2023 Oct 31.
Article in English | MEDLINE | ID: mdl-37883432

ABSTRACT

As the use of large language models (LLMs) grows, it is important to examine whether they exhibit biases in their output. Research in cultural evolution, using transmission chain experiments, demonstrates that humans have biases to attend to, remember, and transmit some types of content over others. Here, in five preregistered experiments using material from previous studies with human participants, we use the same, transmission chain-like methodology, and find that the LLM ChatGPT-3 shows biases analogous to humans for content that is gender-stereotype-consistent, social, negative, threat-related, and biologically counterintuitive, over other content. The presence of these biases in LLM output suggests that such content is widespread in its training data and could have consequential downstream effects, by magnifying preexisting human tendencies for cognitively appealing and not necessarily informative, or valuable, content.


Subject(s)
Cultural Evolution , Language , Humans , Mental Recall , Bias , Ethical Theory
5.
Brief Bioinform ; 25(1)2023 11 22.
Article in English | MEDLINE | ID: mdl-38168838

ABSTRACT

ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this work, we examine the diverse applications of large language models (LLMs), such as ChatGPT, in biomedicine and health. Specifically, we explore the areas of biomedical information retrieval, question answering, medical text summarization, information extraction and medical education and investigate whether LLMs possess the transformative power to revolutionize these tasks or whether the distinct complexities of biomedical domain presents unique challenges. Following an extensive literature survey, we find that significant advances have been made in the field of text generation tasks, surpassing the previous state-of-the-art methods. For other applications, the advances have been modest. Overall, LLMs have not yet revolutionized biomedicine, but recent rapid progress indicates that such methods hold great potential to provide valuable means for accelerating discovery and improving health. We also find that the use of LLMs, like ChatGPT, in the fields of biomedicine and health entails various risks and challenges, including fabricated information in its generated responses, as well as legal and privacy concerns associated with sensitive patient data. We believe this survey can provide a comprehensive and timely overview to biomedical researchers and healthcare practitioners on the opportunities and challenges associated with using ChatGPT and other LLMs for transforming biomedicine and health.


Subject(s)
Information Storage and Retrieval , Language , Humans , Privacy , Research Personnel
6.
Methods ; 222: 133-141, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38242382

ABSTRACT

The versatility of ChatGPT in performing a diverse range of tasks has elicited considerable interest on its potential applications within professional fields. Taking drug discovery as a testbed, this paper provides a comprehensive evaluation of ChatGPT's ability on molecule property prediction. The study focuses on three aspects: 1) Effects of different prompt settings, where we investigate the impact of varying prompts on the prediction outcomes of ChatGPT; 2) Comprehensive evaluation on molecule property prediction, where we conduct a comprehensive evaluation on 53 ADMET-related endpoints; 3) Analysis of ChatGPT's potential and limitations, where we make comparisons with models tailored for molecule property prediction, thus gaining a more accurate understanding of ChatGPT's capabilities and limitations in this area. Through comprehensive evaluation, we find that 1) With appropriate prompt settings, ChatGPT can attain satisfactory prediction outcomes that are competitive with specialized models designed for those tasks. 2) Prompt settings significantly affect ChatGPT's performance. Among all prompt settings, the strategy of selecting examples in few-shot has the greatest impact on results. Scaffold sampling greatly outperforms random sampling. 3) The capacity of ChatGPT to accomplish high-precision predictions is significantly influenced by the quality of examples provided, which may constrain its practical applicability in real-world scenarios. This work highlights ChatGPT's potential and limitations on molecule property prediction, which we hope can inspire future design and evaluation of Large Language Models within scientific domains.


Subject(s)
Drug Discovery , Research Design
7.
J Cell Physiol ; 239(7): e31339, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38924572

ABSTRACT

There is no doubt that navigating academia is a formidable challenge, particularly for those from underrepresented backgrounds who face additional barriers at every turn. In such an environment, efforts to create learning and training environments that are diverse, equitable, and inclusive can feel like an uphill battle. We believe that harnessing the power of artificial intelligence (AI) tools can help in leveling the playing field. While AI cannot supplant the need for supportive mentorship, it can serve as a vital supplement, offering guidance and assistance to those who may lack access to adequate avenues of support. Embracing AI in this context should not be stigmatized, as it may represent a vital lifeline for underrepresented individuals who often face systemic biases while forging their own paths in pursuit of success and belonging in academia. AI tools should not be gatekept from these individuals, particularly by those in positions of power and privilege within the scientific community. Instead, we argue, institutions should make a strong commitment to educating their community members on how to ethically harness these tools.


Subject(s)
Artificial Intelligence , Learning , Humans , Peer Group , Communication , Mentors
8.
Clin Infect Dis ; 78(4): 860-866, 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-37971399

ABSTRACT

Large language models (LLMs) are artificial intelligence systems trained by deep learning algorithms to process natural language and generate text responses to user prompts. Some approach physician performance on a range of medical challenges, leading some proponents to advocate for their potential use in clinical consultation and prompting some consternation about the future of cognitive specialties. However, LLMs currently have limitations that preclude safe clinical deployment in performing specialist consultations, including frequent confabulations, lack of contextual awareness crucial for nuanced diagnostic and treatment plans, inscrutable and unexplainable training data and methods, and propensity to recapitulate biases. Nonetheless, considering the rapid improvement in this technology, growing calls for clinical integration, and healthcare systems that chronically undervalue cognitive specialties, it is critical that infectious diseases clinicians engage with LLMs to enable informed advocacy for how they should-and shouldn't-be used to augment specialist care.


Subject(s)
Communicable Diseases , Drug Labeling , Humans , Artificial Intelligence , Communicable Diseases/diagnosis , Language , Referral and Consultation
9.
Clin Infect Dis ; 78(4): 825-832, 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-37823416

ABSTRACT

BACKGROUND: The development of chatbot artificial intelligence (AI) has raised major questions about their use in healthcare. We assessed the quality and safety of the management suggested by Chat Generative Pre-training Transformer 4 (ChatGPT-4) in real-life practice for patients with positive blood cultures. METHODS: Over a 4-week period in a tertiary care hospital, data from consecutive infectious diseases (ID) consultations for a first positive blood culture were prospectively provided to ChatGPT-4. Data were requested to propose a comprehensive management plan (suspected/confirmed diagnosis, workup, antibiotic therapy, source control, follow-up). We compared the management plan suggested by ChatGPT-4 with the plan suggested by ID consultants based on literature and guidelines. Comparisons were performed by 2 ID physicians not involved in patient management. RESULTS: Forty-four cases with a first episode of positive blood culture were included. ChatGPT-4 provided detailed and well-written responses in all cases. AI's diagnoses were identical to those of the consultant in 26 (59%) cases. Suggested diagnostic workups were satisfactory (ie, no missing important diagnostic tests) in 35 (80%) cases; empirical antimicrobial therapies were adequate in 28 (64%) cases and harmful in 1 (2%). Source control plans were inadequate in 4 (9%) cases. Definitive antibiotic therapies were optimal in 16 (36%) patients and harmful in 2 (5%). Overall, management plans were considered optimal in only 1 patient, as satisfactory in 17 (39%), and as harmful in 7 (16%). CONCLUSIONS: The use of ChatGPT-4 without consultant input remains hazardous when seeking expert medical advice in 2023, especially for severe IDs.


Subject(s)
Physicians , Sepsis , Humans , Artificial Intelligence , Prospective Studies , Software
10.
Cancer ; 2024 Aug 30.
Article in English | MEDLINE | ID: mdl-39211977

ABSTRACT

BACKGROUND: This study evaluated the accuracy, clinical concordance, and readability of the chatbot interface generative pretrained transformer (ChatGPT) 3.5 as a source of breast cancer information for patients. METHODS: Twenty questions that patients are likely to ask ChatGPT were identified by breast cancer advocates. These were posed to ChatGPT 3.5 in July 2023 and were repeated three times. Responses were graded in two domains: accuracy (4-point Likert scale, 4 = worst) and clinical concordance (information is clinically similar to physician response; 5-point Likert scale, 5 = not similar at all). The concordance of responses with repetition was estimated using intraclass correlation coefficient (ICC) of word counts. Response readability was calculated using the Flesch Kincaid readability scale. References were requested and verified. RESULTS: The overall average accuracy was 1.88 (range 1.0-3.0; 95% confidence interval [CI], 1.42-1.94), and clinical concordance was 2.79 (range 1.0-5.0; 95% CI, 1.94-3.64). The average word count was 310 words per response (range, 146-441 words per response) with high concordance (ICC, 0.75; 95% CI, 0.59-0.91; p < .001). The average readability was poor at 37.9 (range, 18.0-60.5) with high concordance (ICC, 0.73; 95% CI, 0.57-0.90; p < .001). There was a weak correlation between ease of readability and better clinical concordance (-0.15; p = .025). Accuracy did not correlate with readability (0.05; p = .079). The average number of references was 1.97 (range, 1-4; total, 119). ChatGPT cited peer-reviewed articles only once and often referenced nonexistent websites (41%). CONCLUSIONS: Because ChatGPT 3.5 responses were incorrect 24% of the time and did not provide real references 41% of the time, patients should be cautioned about using ChatGPT for medical information.

SELECTION OF CITATIONS
SEARCH DETAIL