Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 220
Filter
1.
Am J Epidemiol ; 2024 Jun 17.
Article in English | MEDLINE | ID: mdl-38881045

ABSTRACT

Despite increasing prevalence of hypertension in youth and high adult cardiovascular mortality rates, the long-term consequences of youth-onset hypertension remain unknown. This is due to limitations of prior research such as small sample sizes, reliance on manual record review, and limited analytic methods that did not address major biases. The Study of the Epidemiology of Pediatric Hypertension (SUPERHERO) is a multisite retrospective Registry of youth evaluated by subspecialists for hypertension disorders. Sites obtain harmonized electronic health record data using standardized biomedical informatics scripts validated with randomized manual record review. Inclusion criteria are index visit for International Classification of Diseases Diagnostic Codes, 10th Revision (ICD-10 code)-defined hypertension disorder ≥January 1, 2015 and age <19 years. We exclude patients with ICD-10 code-defined pregnancy, kidney failure on dialysis, or kidney transplantation. Data include demographics, anthropomorphics, U.S. Census Bureau tract, histories, blood pressure, ICD-10 codes, medications, laboratory and imaging results, and ambulatory blood pressure. SUPERHERO leverages expertise in epidemiology, statistics, clinical care, and biomedical informatics to create the largest and most diverse registry of youth with newly diagnosed hypertension disorders. SUPERHERO's goals are to (i) reduce CVD burden across the life course and (ii) establish gold-standard biomedical informatics methods for youth with hypertension disorders.

2.
Life (Basel) ; 14(6)2024 May 21.
Article in English | MEDLINE | ID: mdl-38929638

ABSTRACT

Artificial intelligence models represented in machine learning algorithms are promising tools for risk assessment used to guide clinical and other health care decisions. Machine learning algorithms, however, may house biases that propagate stereotypes, inequities, and discrimination that contribute to socioeconomic health care disparities. The biases include those related to some sociodemographic characteristics such as race, ethnicity, gender, age, insurance, and socioeconomic status from the use of erroneous electronic health record data. Additionally, there is concern that training data and algorithmic biases in large language models pose potential drawbacks. These biases affect the lives and livelihoods of a significant percentage of the population in the United States and globally. The social and economic consequences of the associated backlash cannot be underestimated. Here, we outline some of the sociodemographic, training data, and algorithmic biases that undermine sound health care risk assessment and medical decision-making that should be addressed in the health care system. We present a perspective and overview of these biases by gender, race, ethnicity, age, historically marginalized communities, algorithmic bias, biased evaluations, implicit bias, selection/sampling bias, socioeconomic status biases, biased data distributions, cultural biases and insurance status bias, conformation bias, information bias and anchoring biases and make recommendations to improve large language model training data, including de-biasing techniques such as counterfactual role-reversed sentences during knowledge distillation, fine-tuning, prefix attachment at training time, the use of toxicity classifiers, retrieval augmented generation and algorithmic modification to mitigate the biases moving forward.

3.
J Biomed Inform ; 154: 104653, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38734158

ABSTRACT

Many approaches in biomedical informatics (BMI) rely on the ability to define, gather, and manipulate biomedical data to support health through a cyclical research-practice lifecycle. Researchers within this field are often fortunate to work closely with healthcare and public health systems to influence data generation and capture and have access to a vast amount of biomedical data. Many informaticists also have the expertise to engage with stakeholders, develop new methods and applications, and influence policy. However, research and policy that explicitly seeks to address the systemic drivers of health would more effectively support health. Intersectionality is a theoretical framework that can facilitate such research. It holds that individual human experiences reflect larger socio-structural level systems of privilege and oppression, and cannot be truly understood if these systems are examined in isolation. Intersectionality explicitly accounts for the interrelated nature of systems of privilege and oppression, providing a lens through which to examine and challenge inequities. In this paper, we propose intersectionality as an intervention into how we conduct BMI research. We begin by discussing intersectionality's history and core principles as they apply to BMI. We then elaborate on the potential for intersectionality to stimulate BMI research. Specifically, we posit that our efforts in BMI to improve health should address intersectionality's five key considerations: (1) systems of privilege and oppression that shape health; (2) the interrelated nature of upstream health drivers; (3) the nuances of health outcomes within groups; (4) the problematic and power-laden nature of categories that we assign to people in research and in society; and (5) research to inform and support social change.


Subject(s)
Medical Informatics , Humans , Medical Informatics/methods , Biomedical Research
4.
ArXiv ; 2024 Jun 12.
Article in English | MEDLINE | ID: mdl-38562449

ABSTRACT

The year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments.

5.
J Clin Transl Sci ; 8(1): e39, 2024.
Article in English | MEDLINE | ID: mdl-38476245

ABSTRACT

Objective: Social Determinants of Health (SDOH) greatly influence health outcomes. SDOH surveys, such as the Assessing Circumstances & Offering Resources for Needs (ACORN) survey, have been developed to screen for SDOH in Veterans. The purpose of this study is to determine the terminological representation of the ACORN survey, to aid in natural language processing (NLP). Methods: Each ACORN survey question was read to determine its concepts. Next, Solor was searched for each of the concepts and for the appropriate attributes. If no attributes or concepts existed, they were proposed. Then, each question's concepts and attributes were arranged into subject-relation-object triples. Results: Eleven unique attributes and 18 unique concepts were proposed. These results demonstrate a gap in representing SDOH with terminologies. We believe that using these new concepts and relations will improve NLP, and thus, the care provided to Veterans.

7.
J Clin Med ; 13(4)2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38398389

ABSTRACT

Bronchopulmonary dysplasia (BPD), a chronic lung disease predominantly affecting premature infants, poses substantial clinical challenges. This review delves into the promise of biomedical informatics (BMI) in reshaping BPD research and care. We commence by highlighting the escalating prevalence and healthcare impact of BPD, emphasizing the necessity for innovative strategies to comprehend its intricate nature. To this end, we introduce BMI as a potent toolset adept at managing and analyzing extensive, diverse biomedical data. The challenges intrinsic to BPD research are addressed, underscoring the inadequacies of conventional approaches and the compelling need for data-driven solutions. We subsequently explore how BMI can revolutionize BPD research, encompassing genomics and personalized medicine to reveal potential biomarkers and individualized treatment strategies. Predictive analytics emerges as a pivotal facet of BMI, enabling early diagnosis and risk assessment for timely interventions. Moreover, we examine how mobile health technologies facilitate real-time monitoring and enhance patient engagement, ultimately refining BPD management. Ethical and legal considerations surrounding BMI implementation in BPD research are discussed, accentuating issues of privacy, data security, and informed consent. In summation, this review highlights BMI's transformative potential in advancing BPD research, addressing challenges, and opening avenues for personalized medicine and predictive analytics.

8.
Stud Health Technol Inform ; 310: 690-694, 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38269897

ABSTRACT

Few-shot learning (FSL) is a category of machine learning models that are designed with the intent of solving problems that have small amounts of labeled data available for training. FSL research progress in natural language processing (NLP), particularly within the medical domain, has been notably slow, primarily due to greater difficulties posed by domain-specific characteristics and data sparsity problems. We explored the use of novel methods for text representation and encoding combined with distance-based measures for improving FSL entity detection. In this paper, we propose a data augmentation method to incorporate semantic information from medical texts into the learning process and combine it with a nearest-neighbor classification strategy for predicting entities. Experiments performed on five biomedical text datasets demonstrate that our proposed approach often outperforms other approaches.


Subject(s)
Intention , Names , Cluster Analysis , Machine Learning , Natural Language Processing
9.
Data Brief ; 50: 109618, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37808542

ABSTRACT

The dataset described is an aspect-level sentiment analysis dataset for therapies, including medication, behavioral and other therapies, created by leveraging user-generated text from Twitter. The dataset was constructed by collecting Twitter posts using keywords associated with the therapies (often referred to as treatments). Subsequently, subsets of the collected posts were manually reviewed, and annotation guidelines were developed to categorize the posts as positive, negative, or neutral. The dataset contains a total of 5364 posts mentioning 32 therapies. These posts are further categorized manually into 998 (18.6%) positive, 619 (11.5%) negatives, and 3747 (69.9%) neutral sentiments. The inter-annotation agreement for the dataset was evaluated using Cohen's Kappa score, achieving an 0.82 score. The potential use of this dataset lies in the development of automatic systems that can detect users' sentiments toward therapies based on their posts. While there are other sentiment analysis datasets available, this is the first that encodes sentiments associated with specific therapies. Researchers and developers can utilize this dataset to train sentiment analysis models, natural language processing algorithms, or machine learning systems to accurately identify and analyze the sentiments expressed by consumers on social media platforms like Twitter.

10.
Standards (Basel) ; 3(3): 316-340, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37873508

ABSTRACT

The translational research community, in general, and the Clinical and Translational Science Awards (CTSA) community, in particular, share the vision of repurposing EHRs for research that will improve the quality of clinical practice. Many members of these communities are also aware that electronic health records (EHRs) suffer limitations of data becoming poorly structured, biased, and unusable out of original context. This creates obstacles to the continuity of care, utility, quality improvement, and translational research. Analogous limitations to sharing objective data in other areas of the natural sciences have been successfully overcome by developing and using common ontologies. This White Paper presents the authors' rationale for the use of ontologies with computable semantics for the improvement of clinical data quality and EHR usability formulated for researchers with a stake in clinical and translational science and who are advocates for the use of information technology in medicine but at the same time are concerned by current major shortfalls. This White Paper outlines pitfalls, opportunities, and solutions and recommends increased investment in research and development of ontologies with computable semantics for a new generation of EHRs.

11.
Curr Med Chem ; 2023 Sep 13.
Article in English | MEDLINE | ID: mdl-37711014

ABSTRACT

Gastric cancer (GC) represents a significant global health burden, ranking as the fifth most common malignancy and the fourth leading cause of cancer-related death worldwide. Despite recent advancements in GC treatment, the five-year survival rate for advanced-stage GC patients remains low. Consequently, there is an urgent need to identify novel drug targets and develop effective therapies. However, traditional drug discovery approaches are associated with high costs, time-consuming processes, and a high failure rate, posing challenges in meeting this critical need. In recent years, there has been a rapid increase in the utilization of artificial intelligence (AI) algorithms and big data in drug discovery, particularly in cancer research. AI has the potential to improve the drug discovery process by analyzing vast and complex datasets from multiple sources, enabling the prediction of compound efficacy and toxicity, as well as the optimization of drug candidates. This review provides an overview of the latest AI algorithms and big data employed in drug discovery for GC. Additionally, we examine the various applications of AI in this field, with a specific focus on therapeutic discovery. Moreover, we discuss the challenges, limitations, and prospects of emerging AI methods, which hold significant promise for advancing GC research in the future.

12.
Transl Pediatr ; 12(6): 1213-1224, 2023 Jun 30.
Article in English | MEDLINE | ID: mdl-37427053

ABSTRACT

Background and Objective: Bronchopulmonary dysplasia (BPD) is the most common morbidity associated with prematurity and remains a significant clinical challenge. Bioinformatic approaches, such as genomics, transcriptomics, and proteomics, have emerged as novel methods for studying the underlying mechanisms driving BPD pathogenesis. These methods can be used alongside clinical data to develop a better understanding of BPD and potentially identify the most at risk neonates within the first few weeks of neonatal life. The objective of this review is to provide an overview of the current state-of-the-art in bioinformatics for BPD research. Methods: We conducted a literature review of bioinformatics approaches for BPD using PubMed. The following keywords were used: "biomedical informatics", "bioinformatics", "bronchopulmonary dysplasia", and "omics". Key Content and Findings: This review highlighted the importance of omic-approaches to better understand BPD and potential avenues for future research. We described the use of machine learning (ML) and the need for systems biology methods for integrating large-scale data from multiple tissues. We summarized a handful of studies that utilized bioinformatics for BPD in order to better provide a view of where things currently stand, identify areas of ongoing research, and concluded with challenges that remain in the field. Conclusions: Bioinformatics has the potential to enable a more comprehensive understanding of BPD pathogenesis, facilitating a personalized and precise approach to neonatal care. As we continue to push the boundaries of biomedical research, biomedical informatics (BMI) will undoubtedly play a key role in unraveling new frontiers in disease understanding, prevention, and treatment.

13.
BMC Bioinformatics ; 24(1): 290, 2023 Jul 19.
Article in English | MEDLINE | ID: mdl-37468830

ABSTRACT

BACKGROUND: The growing recognition of the microbiome's impact on human health and well-being has prompted extensive research into discovering the links between microbiome dysbiosis and disease (healthy) states. However, this valuable information is scattered in unstructured form within biomedical literature. The structured extraction and qualification of microbe-disease interactions are important. In parallel, recent advancements in deep-learning-based natural language processing algorithms have revolutionized language-related tasks such as ours. This study aims to leverage state-of-the-art deep-learning language models to extract microbe-disease relationships from biomedical literature. RESULTS: In this study, we first evaluate multiple pre-trained large language models within a zero-shot or few-shot learning context. In this setting, the models performed poorly out of the box, emphasizing the need for domain-specific fine-tuning of these language models. Subsequently, we fine-tune multiple language models (specifically, GPT-3, BioGPT, BioMedLM, BERT, BioMegatron, PubMedBERT, BioClinicalBERT, and BioLinkBERT) using labeled training data and evaluate their performance. Our experimental results demonstrate the state-of-the-art performance of these fine-tuned models ( specifically GPT-3, BioMedLM, and BioLinkBERT), achieving an average F1 score, precision, and recall of over [Formula: see text] compared to the previous best of  0.74. CONCLUSION: Overall, this study establishes that pre-trained language models excel as transfer learners when fine-tuned with domain and problem-specific data, enabling them to achieve state-of-the-art results even with limited training data for extracting microbiome-disease interactions from scientific publications.


Subject(s)
Algorithms , Language , Humans , Natural Language Processing , Health Status , Learning
14.
Comput Methods Programs Biomed ; 240: 107719, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37453366

ABSTRACT

BACKGROUND AND OBJECTIVE: Prostate cancer is one of the most prevalent forms of cancer in men worldwide. Traditional screening strategies such as serum PSA levels, which are not necessarily cancer-specific, or digital rectal exams, which are often inconclusive, are still the screening methods used for the disease. Some studies have focused on identifying biomarkers of the disease but none have been reported for diagnosis in routine clinical practice and few studies have provided tools to assist the pathologist in the decision-making process when analyzing prostate tissue. Therefore, a classifier is proposed to predict the occurrence of PCa that provides physicians with accurate predictions and understandable explanations. METHODS: A selection of 47 genes was made based on differential expression between PCa and normal tissue, GO gene ontology as well as the literature to be used as input predictors for different machine learning methods based on eXplainable Artificial Intelligence. These methods were trained using different class-balancing strategies to build accurate classifiers using gene expression data from 550 samples from 'The Cancer Genome Atlas'. Our model was validated in four external cohorts with different ancestries, totaling 463 samples. In addition, a set of SHapley Additive exPlanations was provided to help clinicians understand the underlying reasons for each decision. RESULTS: An in-depth analysis showed that the Random Forest algorithm combined with majority class downsampling was the best performing approach with robust statistical significance. Our method achieved an average sensitivity and specificity of 0.90 and 0.8 with an AUC of 0.84 across all databases. The relevance of DLX1, MYL9 and FGFR genes for PCa screening was demonstrated in addition to the important role of novel genes such as CAV2 and MYLK. CONCLUSIONS: This model has shown good performance in 4 independent external cohorts of different ancestries and the explanations provided are consistent with each other and with the literature, opening a horizon for its application in clinical practice. In the near future, these genes, in combination with our model, could be applied to liquid biopsy to improve PCa screening.


Subject(s)
Artificial Intelligence , Prostatic Neoplasms , Male , Humans , Prostatic Neoplasms/genetics , Sensitivity and Specificity , Gene Expression
15.
J Biomed Inform ; 144: 104458, 2023 08.
Article in English | MEDLINE | ID: mdl-37488023

ABSTRACT

BACKGROUND: Few-shot learning (FSL) is a class of machine learning methods that require small numbers of labeled instances for training. With many medical topics having limited annotated text-based data in practical settings, FSL-based natural language processing (NLP) holds substantial promise. We aimed to conduct a review to explore the current state of FSL methods for medical NLP. METHODS: We searched for articles published between January 2016 and October 2022 using PubMed/Medline, Embase, ACL Anthology, and IEEE Xplore Digital Library. We also searched the preprint servers (e.g., arXiv, medRxiv, and bioRxiv) via Google Scholar to identify the latest relevant methods. We included all articles that involved FSL and any form of medical text. We abstracted articles based on the data source, target task, training set size, primary method(s)/approach(es), and evaluation metric(s). RESULTS: Fifty-one articles met our inclusion criteria-all published after 2018, and most since 2020 (42/51; 82%). Concept extraction/named entity recognition was the most frequently addressed task (21/51; 41%), followed by text classification (16/51; 31%). Thirty-two (61%) articles reconstructed existing datasets to fit few-shot scenarios, and MIMIC-III was the most frequently used dataset (10/51; 20%). 77% of the articles attempted to incorporate prior knowledge to augment the small datasets available for training. Common methods included FSL with attention mechanisms (20/51; 39%), prototypical networks (11/51; 22%), meta-learning (7/51; 14%), and prompt-based learning methods, the latter being particularly popular since 2021. Benchmarking experiments demonstrated relative underperformance of FSL methods on biomedical NLP tasks. CONCLUSION: Despite the potential for FSL in biomedical NLP, progress has been limited. This may be attributed to the rarity of specialized data, lack of standardized evaluation criteria, and the underperformance of FSL methods on biomedical topics. The creation of publicly-available specialized datasets for biomedical FSL may aid method development by facilitating comparative analyses.


Subject(s)
Machine Learning , Natural Language Processing , PubMed , MEDLINE , Publications
16.
Front Clin Diabetes Healthc ; 4: 1227105, 2023.
Article in English | MEDLINE | ID: mdl-37351484

ABSTRACT

[This corrects the article DOI: 10.3389/fcdhc.2023.1095859.].

17.
Stud Health Technol Inform ; 305: 176-179, 2023 Jun 29.
Article in English | MEDLINE | ID: mdl-37386989

ABSTRACT

Our study contributes to the history of international medical informatics through investigating the thematic evolution of the MEDINFO conferences during a period of consolidation and expansion of the discipline. The themes are examined and potential factors influencing the evolutionary developments are discussed.


Subject(s)
Medical Informatics
18.
J Med Internet Res ; 25: e44047, 2023 06 21.
Article in English | MEDLINE | ID: mdl-37342078

ABSTRACT

BACKGROUND: Testicular sperm extraction (TESE) is an essential therapeutic tool for the management of male infertility. However, it is an invasive procedure with a success rate up to 50%. To date, no model based on clinical and laboratory parameters is sufficiently powerful to accurately predict the success of sperm retrieval in TESE. OBJECTIVE: The aim of this study is to compare a wide range of predictive models under similar conditions for TESE outcomes in patients with nonobstructive azoospermia (NOA) to identify the correct mathematical approach to apply, most appropriate study size, and relevance of the input biomarkers. METHODS: We analyzed 201 patients who underwent TESE at Tenon Hospital (Assistance Publique-Hôpitaux de Paris, Sorbonne University, Paris), distributed in a retrospective training cohort of 175 patients (January 2012 to April 2021) and a prospective testing cohort (May 2021 to December 2021) of 26 patients. Preoperative data (according to the French standard exploration of male infertility, 16 variables) including urogenital history, hormonal data, genetic data, and TESE outcomes (representing the target variable) were collected. A TESE was considered positive if we obtained sufficient spermatozoa for intracytoplasmic sperm injection. After preprocessing the raw data, 8 machine learning (ML) models were trained and optimized on the retrospective training cohort data set: The hyperparameter tuning was performed by random search. Finally, the prospective testing cohort data set was used for the model evaluation. The metrics used to evaluate and compare the models were the following: sensitivity, specificity, area under the receiver operating characteristic curve (AUC-ROC), and accuracy. The importance of each variable in the model was assessed using the permutation feature importance technique, and the optimal number of patients to include in the study was assessed using the learning curve. RESULTS: The ensemble models, based on decision trees, showed the best performance, especially the random forest model, which yielded the following results: AUC=0.90, sensitivity=100%, and specificity=69.2%. Furthermore, a study size of 120 patients seemed sufficient to properly exploit the preoperative data in the modeling process, since increasing the number of patients beyond 120 during model training did not bring any performance improvement. Furthermore, inhibin B and a history of varicoceles exhibited the highest predictive capacity. CONCLUSIONS: An ML algorithm based on an appropriate approach can predict successful sperm retrieval in men with NOA undergoing TESE, with promising performance. However, although this study is consistent with the first step of this process, a subsequent formal prospective multicentric validation study should be undertaken before any clinical applications. As future work, we consider the use of recent and clinically relevant data sets (including seminal plasma biomarkers, especially noncoding RNAs, as markers of residual spermatogenesis in NOA patients) to improve our results even more.


Subject(s)
Azoospermia , Infertility, Male , Humans , Male , Azoospermia/diagnosis , Azoospermia/therapy , Semen , Retrospective Studies , Prospective Studies , Spermatozoa , Algorithms
19.
Front Clin Diabetes Healthc ; 4: 1095859, 2023.
Article in English | MEDLINE | ID: mdl-37138580

ABSTRACT

Background: Hypoglycemia is the most common adverse consequence of treating diabetes, and is often due to suboptimal patient self-care. Behavioral interventions by health professionals and self-care education helps avoid recurrent hypoglycemic episodes by targeting problematic patient behaviors. This relies on time-consuming investigation of reasons behind the observed episodes, which involves manual interpretation of personal diabetes diaries and communication with patients. Therefore, there is a clear motivation to automate this process using a supervised machine learning paradigm. This manuscript presents a feasibility study of automatic identification of hypoglycemia causes. Methods: Reasons for 1885 hypoglycemia events were labeled by 54 participants with type 1 diabetes over a 21 months period. A broad range of possible predictors were extracted describing a hypoglycemic episode and the subject's general self-care from participants' routinely collected data on the Glucollector, their diabetes management platform. Thereafter, the possible hypoglycemia reasons were categorized for two major analysis sections - statistical analysis of relationships between the data features of self-care and hypoglycemia reasons, and classification analysis investigating the design of an automated system to determine the reason for hypoglycemia. Results: Physical activity contributed to 45% of hypoglycemia reasons on the real world collected data. The statistical analysis provided a number of interpretable predictors of different hypoglycemia reasons based on self-care behaviors. The classification analysis showed the performance of a reasoning system in practical settings with different objectives under F1-score, recall and precision metrics. Conclusion: The data acquisition characterized the incidence distribution of the various hypoglycemia reasons. The analyses highlighted many interpretable predictors of the various hypoglycemia types. Also, the feasibility study presented a number of concerns valuable in the design of the decision support system for automatic hypoglycemia reason classification. Therefore, automating the identification of the causes of hypoglycemia may help objectively to target behavioral and therapeutic changes in patients' care.

SELECTION OF CITATIONS
SEARCH DETAIL