Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 115
Filtrar
2.
EBioMedicine ; 104: 105174, 2024 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-38821021

RESUMEN

BACKGROUND: Chest X-rays (CXR) are essential for diagnosing a variety of conditions, but when used on new populations, model generalizability issues limit their efficacy. Generative AI, particularly denoising diffusion probabilistic models (DDPMs), offers a promising approach to generating synthetic images, enhancing dataset diversity. This study investigates the impact of synthetic data supplementation on the performance and generalizability of medical imaging research. METHODS: The study employed DDPMs to create synthetic CXRs conditioned on demographic and pathological characteristics from the CheXpert dataset. These synthetic images were used to supplement training datasets for pathology classifiers, with the aim of improving their performance. The evaluation involved three datasets (CheXpert, MIMIC-CXR, and Emory Chest X-ray) and various experiments, including supplementing real data with synthetic data, training with purely synthetic data, and mixing synthetic data with external datasets. Performance was assessed using the area under the receiver operating curve (AUROC). FINDINGS: Adding synthetic data to real datasets resulted in a notable increase in AUROC values (up to 0.02 in internal and external test sets with 1000% supplementation, p-value <0.01 in all instances). When classifiers were trained exclusively on synthetic data, they achieved performance levels comparable to those trained on real data with 200%-300% data supplementation. The combination of real and synthetic data from different sources demonstrated enhanced model generalizability, increasing model AUROC from 0.76 to 0.80 on the internal test set (p-value <0.01). INTERPRETATION: Synthetic data supplementation significantly improves the performance and generalizability of pathology classifiers in medical imaging. FUNDING: Dr. Gichoya is a 2022 Robert Wood Johnson Foundation Harold Amos Medical Faculty Development Program and declares support from RSNA Health Disparities grant (#EIHD2204), Lacuna Fund (#67), Gordon and Betty Moore Foundation, NIH (NIBIB) MIDRC grant under contracts 75N92020C00008 and 75N92020C00021, and NHLBI Award Number R01HL167811.

3.
Sci Data ; 11(1): 535, 2024 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-38789452

RESUMEN

Pulse oximeters measure peripheral arterial oxygen saturation (SpO2) noninvasively, while the gold standard (SaO2) involves arterial blood gas measurement. There are known racial and ethnic disparities in their performance. BOLD is a dataset that aims to underscore the importance of addressing biases in pulse oximetry accuracy, which disproportionately affect darker-skinned patients. The dataset was created by harmonizing three Electronic Health Record databases (MIMIC-III, MIMIC-IV, eICU-CRD) comprising Intensive Care Unit stays of US patients. Paired SpO2 and SaO2 measurements were time-aligned and combined with various other sociodemographic and parameters to provide a detailed representation of each patient. BOLD includes 49,099 paired measurements, within a 5-minute window and with oxygen saturation levels between 70-100%. Minority racial and ethnic groups account for ~25% of the data - a proportion seldom achieved in previous studies. The codebase is publicly available. Given the prevalent use of pulse oximeters in the hospital and at home, we hope that BOLD will be leveraged to develop debiasing algorithms that can result in more equitable healthcare solutions.


Asunto(s)
Análisis de los Gases de la Sangre , Oximetría , Humanos , Saturación de Oxígeno , Unidades de Cuidados Intensivos , Etnicidad , Oxígeno/sangre
4.
J Imaging Inform Med ; 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38558368

RESUMEN

In recent years, the role of Artificial Intelligence (AI) in medical imaging has become increasingly prominent, with the majority of AI applications approved by the FDA being in imaging and radiology in 2023. The surge in AI model development to tackle clinical challenges underscores the necessity for preparing high-quality medical imaging data. Proper data preparation is crucial as it fosters the creation of standardized and reproducible AI models while minimizing biases. Data curation transforms raw data into a valuable, organized, and dependable resource and is a fundamental process to the success of machine learning and analytical projects. Considering the plethora of available tools for data curation in different stages, it is crucial to stay informed about the most relevant tools within specific research areas. In the current work, we propose a descriptive outline for different steps of data curation while we furnish compilations of tools collected from a survey applied among members of the Society of Imaging Informatics (SIIM) for each of these stages. This collection has the potential to enhance the decision-making process for researchers as they select the most appropriate tool for their specific tasks.

5.
AJR Am J Roentgenol ; 2024 Apr 10.
Artículo en Inglés | MEDLINE | ID: mdl-38598354

RESUMEN

Large language models (LLMs) hold immense potential to revolutionize radiology. However, their integration into practice requires careful consideration. Artificial intelligence (AI) chatbots and general-purpose LLMs have potential pitfalls related to privacy, transparency, and accuracy, limiting their current clinical readiness. Thus, LLM-based tools must be optimized for radiology practice to overcome these limitations. While research and validation for radiology applications remain in their infancy, commercial products incorporating LLMs are becoming available alongside promises of transforming practice. To help radiologists navigate this landscape, this AJR Expert Panel Narrative Review provides a multidimensional perspective on LLMs, encompassing considerations from bench (development and optimization) to bedside (use in practice). At present, LLMs are not autonomous entities that can replace expert decision-making, and radiologists remain responsible for the content of their reports. Patient-facing tools, particularly medical AI chatbots, require additional guardrails to ensure safety and prevent misuse. Still, if responsibly implemented, LLMs are well-positioned to transform efficiency and quality in radiology. Radiologists must be well-informed and proactively involved in guiding the implementation of LLMs in practice to mitigate risks and maximize benefits to patient care.

6.
PLOS Digit Health ; 3(4): e0000474, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38620047

RESUMEN

Despite significant technical advances in machine learning (ML) over the past several years, the tangible impact of this technology in healthcare has been limited. This is due not only to the particular complexities of healthcare, but also due to structural issues in the machine learning for healthcare (MLHC) community which broadly reward technical novelty over tangible, equitable impact. We structure our work as a healthcare-focused echo of the 2012 paper "Machine Learning that Matters", which highlighted such structural issues in the ML community at large, and offered a series of clearly defined "Impact Challenges" to which the field should orient itself. Drawing on the expertise of a diverse and international group of authors, we engage in a narrative review and examine issues in the research background environment, training processes, evaluation metrics, and deployment protocols which act to limit the real-world applicability of MLHC. Broadly, we seek to distinguish between machine learning ON healthcare data and machine learning FOR healthcare-the former of which sees healthcare as merely a source of interesting technical challenges, and the latter of which regards ML as a tool in service of meeting tangible clinical needs. We offer specific recommendations for a series of stakeholders in the field, from ML researchers and clinicians, to the institutions in which they work, and the governments which regulate their data access.

7.
BMC Med Ethics ; 25(1): 46, 2024 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-38637857

RESUMEN

BACKGROUND: The ethical governance of Artificial Intelligence (AI) in health care and public health continues to be an urgent issue for attention in policy, research, and practice. In this paper we report on central themes related to challenges and strategies for promoting ethics in research involving AI in global health, arising from the Global Forum on Bioethics in Research (GFBR), held in Cape Town, South Africa in November 2022. METHODS: The GFBR is an annual meeting organized by the World Health Organization and supported by the Wellcome Trust, the US National Institutes of Health, the UK Medical Research Council (MRC) and the South African MRC. The forum aims to bring together ethicists, researchers, policymakers, research ethics committee members and other actors to engage with challenges and opportunities specifically related to research ethics. In 2022 the focus of the GFBR was "Ethics of AI in Global Health Research". The forum consisted of 6 case study presentations, 16 governance presentations, and a series of small group and large group discussions. A total of 87 participants attended the forum from 31 countries around the world, representing disciplines of bioethics, AI, health policy, health professional practice, research funding, and bioinformatics. In this paper, we highlight central insights arising from GFBR 2022. RESULTS: We describe the significance of four thematic insights arising from the forum: (1) Appropriateness of building AI, (2) Transferability of AI systems, (3) Accountability for AI decision-making and outcomes, and (4) Individual consent. We then describe eight recommendations for governance leaders to enhance the ethical governance of AI in global health research, addressing issues such as AI impact assessments, environmental values, and fair partnerships. CONCLUSIONS: The 2022 Global Forum on Bioethics in Research illustrated several innovations in ethical governance of AI for global health research, as well as several areas in need of urgent attention internationally. This summary is intended to inform international and domestic efforts to strengthen research ethics and support the evolution of governance leadership to meet the demands of AI in global health research.


Asunto(s)
Inteligencia Artificial , Bioética , Humanos , Salud Global , Sudáfrica , Ética en Investigación
8.
Health Promot Int ; 39(2)2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38558241

RESUMEN

Although digital health promotion (DHP) technologies for young people are increasingly available in low- and middle-income countries (LMICs), there has been insufficient research investigating whether existing ethical and policy frameworks are adequate to address the challenges and promote the technological opportunities in these settings. In an effort to fill this gap and as part of a larger research project, in November 2022, we conducted a workshop in Cape Town, South Africa, entitled 'Unlocking the Potential of Digital Health Promotion for Young People in Low- and Middle-Income Countries'. The workshop brought together 25 experts from the areas of digital health ethics, youth health and engagement, health policy and promotion and technology development, predominantly from sub-Saharan Africa (SSA), to explore their views on the ethics and governance and potential policy pathways of DHP for young people in LMICs. Using the World Café method, participants contributed their views on (i) the advantages and barriers associated with DHP for youth in LMICs, (ii) the availability and relevance of ethical and regulatory frameworks for DHP and (iii) the translation of ethical principles into policies and implementation practices required by these policies, within the context of SSA. Our thematic analysis of the ensuing discussion revealed a willingness to foster such technologies if they prove safe, do not exacerbate inequalities, put youth at the center and are subject to appropriate oversight. In addition, our work has led to the potential translation of fundamental ethical principles into the form of a policy roadmap for ethically aligned DHP for youth in SSA.


Asunto(s)
Salud Digital , Política de Salud , Humanos , Adolescente , Sudáfrica , Promoción de la Salud
9.
Can Assoc Radiol J ; : 8465371241236376, 2024 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-38445497

RESUMEN

Artificial intelligence (AI) is rapidly evolving and has transformative potential for interventional radiology (IR) clinical practice. However, formal training in AI may be limited for many clinicians and therefore presents a challenge for initial implementation and trust in AI. An understanding of the foundational concepts in AI may help familiarize the interventional radiologist with the field of AI, thus facilitating understanding and participation in the development and deployment of AI. A pragmatic classification system of AI based on the complexity of the model may guide clinicians in the assessment of AI. Finally, the current state of AI in IR and the patterns of implementation are explored (pre-procedural, intra-procedural, and post-procedural).

10.
Can Assoc Radiol J ; : 8465371241236377, 2024 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-38445517

RESUMEN

The introduction of artificial intelligence (AI) in interventional radiology (IR) will bring about new challenges and opportunities for patients and clinicians. AI may comprise software as a medical device or AI-integrated hardware and will require a rigorous evaluation that should be guided based on the level of risk of the implementation. A hierarchy of risk of harm and possible harms are described herein. A checklist to guide deployment of an AI in a clinical IR environment is provided. As AI continues to evolve, regulation and evaluation of the AI medical devices will need to continue to evolve to keep pace and ensure patient safety.

11.
EBioMedicine ; 102: 105047, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38471396

RESUMEN

BACKGROUND: It has been shown that AI models can learn race on medical images, leading to algorithmic bias. Our aim in this study was to enhance the fairness of medical image models by eliminating bias related to race, age, and sex. We hypothesise models may be learning demographics via shortcut learning and combat this using image augmentation. METHODS: This study included 44,953 patients who identified as Asian, Black, or White (mean age, 60.68 years ±18.21; 23,499 women) for a total of 194,359 chest X-rays (CXRs) from MIMIC-CXR database. The included CheXpert images comprised 45,095 patients (mean age 63.10 years ±18.14; 20,437 women) for a total of 134,300 CXRs were used for external validation. We also collected 1195 3D brain magnetic resonance imaging (MRI) data from the ADNI database, which included 273 participants with an average age of 76.97 years ±14.22, and 142 females. DL models were trained on either non-augmented or augmented images and assessed using disparity metrics. The features learned by the models were analysed using task transfer experiments and model visualisation techniques. FINDINGS: In the detection of radiological findings, training a model using augmented CXR images was shown to reduce disparities in error rate among racial groups (-5.45%), age groups (-13.94%), and sex (-22.22%). For AD detection, the model trained with augmented MRI images was shown 53.11% and 31.01% reduction of disparities in error rate among age and sex groups, respectively. Image augmentation led to a reduction in the model's ability to identify demographic attributes and resulted in the model trained for clinical purposes incorporating fewer demographic features. INTERPRETATION: The model trained using the augmented images was less likely to be influenced by demographic information in detecting image labels. These results demonstrate that the proposed augmentation scheme could enhance the fairness of interpretations by DL models when dealing with data from patients with different demographic backgrounds. FUNDING: National Science and Technology Council (Taiwan), National Institutes of Health.


Asunto(s)
Benchmarking , Aprendizaje , Estados Unidos , Humanos , Femenino , Anciano , Persona de Mediana Edad , Población Negra , Encéfalo , Demografía
12.
medRxiv ; 2024 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-38464170

RESUMEN

Importance: Pulse oximetry, a ubiquitous vital sign in modern medicine, has inequitable accuracy that disproportionately affects Black and Hispanic patients, with associated increases in mortality, organ dysfunction, and oxygen therapy. Although the root cause of these clinical performance discrepancies is believed to be skin tone, previous retrospective studies used self-reported race or ethnicity as a surrogate for skin tone. Objective: To determine the utility of objectively measured skin tone in explaining pulse oximetry discrepancies. Design Setting and Participants: Admitted hospital patients at Duke University Hospital were eligible for this prospective cohort study if they had pulse oximetry recorded up to 5 minutes prior to arterial blood gas (ABG) measurements. Skin tone was measured across sixteen body locations using administered visual scales (Fitzpatrick Skin Type, Monk Skin Tone, and Von Luschan), reflectance colorimetry (Delfin SkinColorCatch [L*, individual typology angle {ITA}, Melanin Index {MI}]), and reflectance spectrophotometry (Konica Minolta CM-700D [L*], Variable Spectro 1 [L*]). Main Outcomes and Measures: Mean directional bias, variability of bias, and accuracy root mean square (ARMS), comparing pulse oximetry and ABG measurements. Linear mixed-effects models were fitted to estimate mean directional bias while accounting for clinical confounders. Results: 128 patients (57 Black, 56 White) with 521 ABG-pulse oximetry pairs were recruited, none with hidden hypoxemia. Skin tone data was prospectively collected using 6 measurement methods, generating 8 measurements. The collected skin tone measurements were shown to yield differences among each other and overlap with self-reported racial groups, suggesting that skin tone could potentially provide information beyond self-reported race. Among the eight skin tone measurements in this study, and compared to self-reported race, the Monk Scale had the best relationship with differences in pulse oximetry bias (point estimate: -2.40%; 95% CI: -4.32%, -0.48%; p=0.01) when comparing patients with lighter and dark skin tones. Conclusions and relevance: We found clinical performance differences in pulse oximetry, especially in darker skin tones. Additional studies are needed to determine the relative contributions of skin tone measures and other potential factors on pulse oximetry discrepancies.

13.
Curr Atheroscler Rep ; 26(4): 91-102, 2024 04.
Artículo en Inglés | MEDLINE | ID: mdl-38363525

RESUMEN

PURPOSE OF REVIEW: Bias in artificial intelligence (AI) models can result in unintended consequences. In cardiovascular imaging, biased AI models used in clinical practice can negatively affect patient outcomes. Biased AI models result from decisions made when training and evaluating a model. This paper is a comprehensive guide for AI development teams to understand assumptions in datasets and chosen metrics for outcome/ground truth, and how this translates to real-world performance for cardiovascular disease (CVD). RECENT FINDINGS: CVDs are the number one cause of mortality worldwide; however, the prevalence, burden, and outcomes of CVD vary across gender and race. Several biomarkers are also shown to vary among different populations and ethnic/racial groups. Inequalities in clinical trial inclusion, clinical presentation, diagnosis, and treatment are preserved in health data that is ultimately used to train AI algorithms, leading to potential biases in model performance. Despite the notion that AI models themselves are biased, AI can also help to mitigate bias (e.g., bias auditing tools). In this review paper, we describe in detail implicit and explicit biases in the care of cardiovascular disease that may be present in existing datasets but are not obvious to model developers. We review disparities in CVD outcomes across different genders and race groups, differences in treatment of historically marginalized groups, and disparities in clinical trials for various cardiovascular diseases and outcomes. Thereafter, we summarize some CVD AI literature that shows bias in CVD AI as well as approaches that AI is being used to mitigate CVD bias.


Asunto(s)
Inteligencia Artificial , Enfermedades Cardiovasculares , Femenino , Masculino , Humanos , Enfermedades Cardiovasculares/diagnóstico por imagen , Algoritmos , Sesgo
14.
Radiology ; 310(2): e232030, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38411520

RESUMEN

According to the World Health Organization, climate change is the single biggest health threat facing humanity. The global health care system, including medical imaging, must manage the health effects of climate change while at the same time addressing the large amount of greenhouse gas (GHG) emissions generated in the delivery of care. Data centers and computational efforts are increasingly large contributors to GHG emissions in radiology. This is due to the explosive increase in big data and artificial intelligence (AI) applications that have resulted in large energy requirements for developing and deploying AI models. However, AI also has the potential to improve environmental sustainability in medical imaging. For example, use of AI can shorten MRI scan times with accelerated acquisition times, improve the scheduling efficiency of scanners, and optimize the use of decision-support tools to reduce low-value imaging. The purpose of this Radiology in Focus article is to discuss this duality at the intersection of environmental sustainability and AI in radiology. Further discussed are strategies and opportunities to decrease AI-related emissions and to leverage AI to improve sustainability in radiology, with a focus on health equity. Co-benefits of these strategies are explored, including lower cost and improved patient outcomes. Finally, knowledge gaps and areas for future research are highlighted.


Asunto(s)
Inteligencia Artificial , Radiología , Humanos , Radiografía , Macrodatos , Cambio Climático
15.
JMIR Med Educ ; 10: e46500, 2024 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-38376896

RESUMEN

BACKGROUND: Artificial intelligence (AI) and machine learning (ML) are poised to have a substantial impact in the health care space. While a plethora of web-based resources exist to teach programming skills and ML model development, there are few introductory curricula specifically tailored to medical students without a background in data science or programming. Programs that do exist are often restricted to a specific specialty. OBJECTIVE: We hypothesized that a 1-month elective for fourth-year medical students, composed of high-quality existing web-based resources and a project-based structure, would empower students to learn about the impact of AI and ML in their chosen specialty and begin contributing to innovation in their field of interest. This study aims to evaluate the success of this elective in improving self-reported confidence scores in AI and ML. The authors also share our curriculum with other educators who may be interested in its adoption. METHODS: This elective was offered in 2 tracks: technical (for students who were already competent programmers) and nontechnical (with no technical prerequisites, focusing on building a conceptual understanding of AI and ML). Students established a conceptual foundation of knowledge using curated web-based resources and relevant research papers, and were then tasked with completing 3 projects in their chosen specialty: a data set analysis, a literature review, and an AI project proposal. The project-based nature of the elective was designed to be self-guided and flexible to each student's interest area and career goals. Students' success was measured by self-reported confidence in AI and ML skills in pre and postsurveys. Qualitative feedback on students' experiences was also collected. RESULTS: This web-based, self-directed elective was offered on a pass-or-fail basis each month to fourth-year students at Emory University School of Medicine beginning in May 2021. As of June 2022, a total of 19 students had successfully completed the elective, representing a wide range of chosen specialties: diagnostic radiology (n=3), general surgery (n=1), internal medicine (n=5), neurology (n=2), obstetrics and gynecology (n=1), ophthalmology (n=1), orthopedic surgery (n=1), otolaryngology (n=2), pathology (n=2), and pediatrics (n=1). Students' self-reported confidence scores for AI and ML rose by 66% after this 1-month elective. In qualitative surveys, students overwhelmingly reported enthusiasm and satisfaction with the course and commented that the self-direction and flexibility and the project-based design of the course were essential. CONCLUSIONS: Course participants were successful in diving deep into applications of AI in their widely-ranging specialties, produced substantial project deliverables, and generally reported satisfaction with their elective experience. The authors are hopeful that a brief, 1-month investment in AI and ML education during medical school will empower this next generation of physicians to pave the way for AI and ML innovation in health care.


Asunto(s)
Inteligencia Artificial , Educación Médica , Humanos , Curriculum , Internet , Estudiantes de Medicina
16.
PLOS Digit Health ; 3(2): e0000297, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38408043

RESUMEN

Radiology specific clinical decision support systems (CDSS) and artificial intelligence are poorly integrated into the radiologist workflow. Current research and development efforts of radiology CDSS focus on 4 main interventions, based around exam centric time points-after image acquisition, intra-report support, post-report analysis, and radiology workflow adjacent. We review the literature surrounding CDSS tools in these time points, requirements for CDSS workflow augmentation, and technologies that support clinician to computer workflow augmentation. We develop a theory of radiologist-decision tool interaction using a sequential explanatory study design. The study consists of 2 phases, the first a quantitative survey and the second a qualitative interview study. The phase 1 survey identifies differences between average users and radiologist users in software interventions using the User Acceptance of Information Technology: Toward a Unified View (UTAUT) framework. Phase 2 semi-structured interviews provide narratives on why these differences are found. To build this theory, we propose a novel solution called Radibot-a conversational agent capable of engaging clinicians with CDSS as an assistant using existing instant messaging systems supporting hospital communications. This work contributes an understanding of how radiologist-users differ from the average user and can be utilized by software developers to increase satisfaction of CDSS tools within radiology.

17.
Circulation ; 149(6): e296-e311, 2024 02 06.
Artículo en Inglés | MEDLINE | ID: mdl-38193315

RESUMEN

Multiple applications for machine learning and artificial intelligence (AI) in cardiovascular imaging are being proposed and developed. However, the processes involved in implementing AI in cardiovascular imaging are highly diverse, varying by imaging modality, patient subtype, features to be extracted and analyzed, and clinical application. This article establishes a framework that defines value from an organizational perspective, followed by value chain analysis to identify the activities in which AI might produce the greatest incremental value creation. The various perspectives that should be considered are highlighted, including clinicians, imagers, hospitals, patients, and payers. Integrating the perspectives of all health care stakeholders is critical for creating value and ensuring the successful deployment of AI tools in a real-world setting. Different AI tools are summarized, along with the unique aspects of AI applications to various cardiac imaging modalities, including cardiac computed tomography, magnetic resonance imaging, and positron emission tomography. AI is applicable and has the potential to add value to cardiovascular imaging at every step along the patient journey, from selecting the more appropriate test to optimizing image acquisition and analysis, interpreting the results for classification and diagnosis, and predicting the risk for major adverse cardiac events.


Asunto(s)
American Heart Association , Inteligencia Artificial , Humanos , Aprendizaje Automático , Corazón , Imagen por Resonancia Magnética
19.
Crit Care Med ; 52(2): 345-348, 2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38240516
20.
PLOS Digit Health ; 3(1): e0000417, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38236824

RESUMEN

The study provides a comprehensive review of OpenAI's Generative Pre-trained Transformer 4 (GPT-4) technical report, with an emphasis on applications in high-risk settings like healthcare. A diverse team, including experts in artificial intelligence (AI), natural language processing, public health, law, policy, social science, healthcare research, and bioethics, analyzed the report against established peer review guidelines. The GPT-4 report shows a significant commitment to transparent AI research, particularly in creating a systems card for risk assessment and mitigation. However, it reveals limitations such as restricted access to training data, inadequate confidence and uncertainty estimations, and concerns over privacy and intellectual property rights. Key strengths identified include the considerable time and economic investment in transparent AI research and the creation of a comprehensive systems card. On the other hand, the lack of clarity in training processes and data raises concerns about encoded biases and interests in GPT-4. The report also lacks confidence and uncertainty estimations, crucial in high-risk areas like healthcare, and fails to address potential privacy and intellectual property issues. Furthermore, this study emphasizes the need for diverse, global involvement in developing and evaluating large language models (LLMs) to ensure broad societal benefits and mitigate risks. The paper presents recommendations such as improving data transparency, developing accountability frameworks, establishing confidence standards for LLM outputs in high-risk settings, and enhancing industry research review processes. It concludes that while GPT-4's report is a step towards open discussions on LLMs, more extensive interdisciplinary reviews are essential for addressing bias, harm, and risk concerns, especially in high-risk domains. The review aims to expand the understanding of LLMs in general and highlights the need for new reflection forms on how LLMs are reviewed, the data required for effective evaluation, and addressing critical issues like bias and risk.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA