Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 949
1.
BMC Med Educ ; 24(1): 564, 2024 May 23.
Article En | MEDLINE | ID: mdl-38783229

BACKGROUND: Health Data Science (HDS) is a novel interdisciplinary field that integrates biological, clinical, and computational sciences with the aim of analysing clinical and biological data through the utilisation of computational methods. Training healthcare specialists who are knowledgeable in both health and data sciences is highly required, important, and challenging. Therefore, it is essential to analyse students' learning experiences through artificial intelligence techniques in order to provide both teachers and learners with insights about effective learning strategies and to improve existing HDS course designs. METHODS: We applied artificial intelligence methods to uncover learning tactics and strategies employed by students in an HDS massive open online course with over 3,000 students enrolled. We also used statistical tests to explore students' engagement with different resources (such as reading materials and lecture videos) and their level of engagement with various HDS topics. RESULTS: We found that students in HDS employed four learning tactics, such as actively connecting new information to their prior knowledge, taking assessments and practising programming to evaluate their understanding, collaborating with their classmates, and repeating information to memorise. Based on the employed tactics, we also found three types of learning strategies, including low engagement (Surface learners), moderate engagement (Strategic learners), and high engagement (Deep learners), which are in line with well-known educational theories. The results indicate that successful students allocate more time to practical topics, such as projects and discussions, make connections among concepts, and employ peer learning. CONCLUSIONS: We applied artificial intelligence techniques to provide new insights into HDS education. Based on the findings, we provide pedagogical suggestions not only for course designers but also for teachers and learners that have the potential to improve the learning experience of HDS students.


Artificial Intelligence , Data Science , Humans , Data Science/education , Curriculum , Learning
2.
AMA J Ethics ; 26(4): E306-314, 2024 Apr 01.
Article En | MEDLINE | ID: mdl-38564745

Drug shortages are a persistent and serious problem in the United States, affecting patient care and health care costs. This article canvasses factors that contribute to drug shortages, such as manufacturing complexity, price, and quality inspection records. This article further proposes an early warning system and payment, contracting, and pricing innovations to mitigate drug shortages and offers data-driven recommendations to stakeholders looking to protect the supply of quality medicines.


Data Science , Drug Industry , Humans , United States , Health Care Costs
3.
BMC Med Imaging ; 24(1): 86, 2024 Apr 10.
Article En | MEDLINE | ID: mdl-38600525

Medical imaging AI systems and big data analytics have attracted much attention from researchers of industry and academia. The application of medical imaging AI systems and big data analytics play an important role in the technology of content based remote sensing (CBRS) development. Environmental data, information, and analysis have been produced promptly using remote sensing (RS). The method for creating a useful digital map from an image data set is called image information extraction. Image information extraction depends on target recognition (shape and color). For low-level image attributes like texture, Classifier-based Retrieval(CR) techniques are ineffective since they categorize the input images and only return images from the determined classes of RS. The issues mentioned earlier cannot be handled by the existing expertise based on a keyword/metadata remote sensing data service model. To get over these restrictions, Fuzzy Class Membership-based Image Extraction (FCMIE), a technology developed for Content-Based Remote Sensing (CBRS), is suggested. The compensation fuzzy neural network (CFNN) is used to calculate the category label and fuzzy category membership of the query image. Use a basic and balanced weighted distance metric. Feature information extraction (FIE) enhances remote sensing image processing and autonomous information retrieval of visual content based on time-frequency meaning, such as color, texture and shape attributes of images. Hierarchical nested structure and cyclic similarity measure produce faster queries when searching. The experiment's findings indicate that applying the proposed model can have favorable outcomes for assessment measures, including Ratio of Coverage, average means precision, recall, and efficiency retrieval that are attained more effectively than the existing CR model. In the areas of feature tracking, climate forecasting, background noise reduction, and simulating nonlinear functional behaviors, CFNN has a wide range of RS applications. The proposed method CFNN-FCMIE achieves a minimum range of 4-5% for all three feature vectors, sample mean and comparison precision-recall ratio, which gives better results than the existing classifier-based retrieval model. This work provides an important reference for medical imaging artificial intelligence system and big data analysis.


Artificial Intelligence , Remote Sensing Technology , Humans , Data Science , Information Storage and Retrieval , Neural Networks, Computer
4.
Environ Sci Technol ; 58(15): 6457-6474, 2024 Apr 16.
Article En | MEDLINE | ID: mdl-38568682

The circular economy (CE) aims to decouple the growth of the economy from the consumption of finite resources through strategies, such as eliminating waste, circulating materials in use, and regenerating natural systems. Due to the rapid development of data science (DS), promising progress has been made in the transition toward CE in the past decade. DS offers various methods to achieve accurate predictions, accelerate product sustainable design, prolong asset life, optimize the infrastructure needed to circulate materials, and provide evidence-based insights. Despite the exciting scientific advances in this field, there still lacks a comprehensive review on this topic to summarize past achievements, synthesize knowledge gained, and navigate future research directions. In this paper, we try to summarize how DS accelerated the transition to CE. We conducted a critical review of where and how DS has helped the CE transition with a focus on four areas including (1) characterizing socioeconomic metabolism, (2) reducing unnecessary waste generation by enhancing material efficiency and optimizing product design, (3) extending product lifetime through repair, and (4) facilitating waste reuse and recycling. We also introduced the limitations and challenges in the current applications and discussed opportunities to provide a clear roadmap for future research in this field.


Data Science , Waste Management , Recycling
5.
RECIIS (Online) ; 18(1)jan.-mar. 2024.
Article Pt | LILACS, ColecionaSUS | ID: biblio-1553650

Este estudo tem como objetivo identificar, na literatura científica, produtos e serviços desenvolvidos por bibliotecários vislumbrando as práticas de Ciência Aberta. A questão principal é identificar: qual o papel dos bibliotecários frente aos desafios da Ciência Aberta? Predominantemente qualitativa, esta pesquisa pode ser caracterizada como bibliográfica, exploratória e descritiva. Para atingir seu objetivo, utilizou-se a técnica de revisão rápida de literatura. Foi realizado um levantamento de publicações indexadas na Brapci, na Scopus e na Web of Science, sendo recuperadas três publicações em cada. Ao excluir um título que se repetiu, o corpus da pesquisa configurou-se com seis artigos e dois resumos apresentados em evento. Conclui-se que debates sobre o novo modus operandi de fazer ciência vêm aumentando e os bibliotecários parecem intimamente relacionados às ações de Ciência Aberta nas diversas etapas da pesquisa científica. Devido às suas habilidades e aos seus serviços, entende-se que exercem um dos papéis centrais na concretização da abertura da ciência.


This study aims to identify, in the scientific literature, products and services developed by librarians with a view to Open Science practices. The main question is to identify: what role is played by librarians facing the challenges of Open Science? Predominantly qualitative, this research can be characterized as bibliographic, exploratory, and descriptive. To achieve its objective, a rapid literature review technique was used. A survey of publications indexed in Brapci, Scopus and Web of Science was carried out, and three publications from each were retrieved. After excluding one title that was repeated, the research corpus consisted of six articles and two abstracts presented at an event. We conclude that debates about the new modus operandi of doing science have been increasing and librarians seem closely related to Open Science actions in the various stages of scientific research. Because of their skills and services, they play one of the central roles to achieve the opening of science.


Este studio tiene como objetivo identificaren la literature científica los productos y servicios desarrollados por los bibliotecarios com vistas a las prácticas de la Ciencia Abierta. La cuestión principal es identificar: ¿ cuál es el papel de los bibliotecarios ante los desafíos de la Ciencia Abierta? Predominantemente cualita-tiva, esta investigación puede caracterizar se como bibliográfica, exploratoria y descriptiva. Para lograr su objetivo, se utilizó la técnica de revision rápida de la literatura. Se realizó un estudio de las publicaciones indexadas en Brapci, Scopus y Web of Science, recuperándo se tres publicaciones en cada una de ellas. Al excluir un título repetido, el corpus de la investigación quedó configurado con seis artículos y dos resúmenes presentados en un evento. Concluimos que los debates sobre el nuevo modus operandi de hacer ciencia han aumentado y los bibliotecarios parecen estar estrechamente relacionados con las acciones de la Ciencia Abierta en las distintas etapas de la investigación científica. Por sus habilidades y servicios, se entiende que ejercen uno de los papeles centrales en la realización de la Ciencia Abierta.


Librarians , Access to Information , Information Dissemination , Open Access Publishing , Data Science , Information Services , Database , Education , Scientific Communication and Diffusion
7.
BMC Palliat Care ; 23(1): 62, 2024 Mar 02.
Article En | MEDLINE | ID: mdl-38429698

BACKGROUND: Breakthrough cancer pain (BTCP) is primarily managed at home and can stem from physical exertion and emotional distress triggers. Beyond these triggers, the impact of ambient environment on pain occurrence and intensity has not been investigated. This study explores the impact of environmental factors on the frequency and severity of breakthrough cancer pain (BTCP) in the home context from the perspective of patients with advanced cancer and their primary family caregiver. METHODS: A health monitoring system was deployed in the homes of patient and family caregiver dyads to collect self-reported pain events and contextual environmental data (light, temperature, humidity, barometric pressure, ambient noise.) Correlation analysis examined the relationship between environmental factors with: 1) individually reported pain episodes and 2) overall pain trends in a 24-hour time window. Machine learning models were developed to explore how environmental factors may predict BTCP episodes. RESULTS: Variability in correlation strength between environmental variables and pain reports among dyads was found. Light and noise show moderate association (r = 0.50-0.70) in 66% of total deployments. The strongest correlation for individual pain events involved barometric pressure (r = 0.90); for pain trends over 24-hours the strongest correlations involved humidity (r = 0.84) and barometric pressure (r = 0.83). Machine learning achieved 70% BTCP prediction accuracy. CONCLUSION: Our study provides insights into the role of ambient environmental factors in BTCP and offers novel opportunities to inform personalized pain management strategies, remotely support patients and their caregivers in self-symptom management. This research provides preliminary evidence of the impact of ambient environmental factors on BTCP in the home setting. We utilized real-world data and correlation analysis to provide an understanding of the relationship between environmental factors and cancer pain which may be helpful to others engaged in similar work.


Breakthrough Pain , Cancer Pain , Neoplasms , Humans , Analgesics, Opioid , Data Science , Pain Management , Neoplasms/complications
8.
J Glob Health ; 14: 04070, 2024 Mar 29.
Article En | MEDLINE | ID: mdl-38547497

Background: OpenAI's Chat Generative Pre-trained Transformer 4.0 (ChatGPT-4), an emerging artificial intelligence (AI)-based large language model (LLM), has been receiving increasing attention from the medical research community for its innovative 'Data Analyst' feature. We aimed to compare the capabilities of ChatGPT-4 against traditional biostatistical software (i.e. SAS, SPSS, R) in statistically analysing epidemiological research data. Methods: We used a data set from the China Health and Nutrition Survey, comprising 9317 participants and 29 variables (e.g. gender, age, educational level, marital status, income, occupation, weekly working hours, survival status). Two researchers independently evaluated the data analysis capabilities of GPT-4's 'Data Analyst' feature against SAS, SPSS, and R across three commonly used epidemiological analysis methods: Descriptive statistics, intergroup analysis, and correlation analysis. We used an internally developed evaluation scale to assess and compare the consistency of results, analytical efficiency of coding or operations, user-friendliness, and overall performance between ChatGPT-4, SAS, SPSS, and R. Results: In descriptive statistics, ChatGPT-4 showed high consistency of results, greater analytical efficiency of code or operations, and more intuitive user-friendliness compared to SAS, SPSS, and R. In intergroup comparisons and correlational analyses, despite minor discrepancies in statistical outcomes for certain analysis tasks with SAS, SPSS, and R, ChatGPT-4 maintained high analytical efficiency and exceptional user-friendliness. Thus, employing ChatGPT-4 can significantly lower the operational threshold for conducting epidemiological data analysis while maintaining consistency with traditional biostatistical software's outcome, requiring only specific, clear analysis instructions without any additional operations or code writing. Conclusions: We found ChatGPT-4 to be a powerful auxiliary tool for statistical analysis in epidemiological research. However, it showed limitations in result consistency and in applying more advanced statistical methods. Therefore, we advocate for the use of ChatGPT-4 in supporting researchers with intermediate experience in data analysis. With AI technologies like LLMs advancing rapidly, their integration with data analysis platforms promises to lower operational barriers, thereby enabling researchers to dedicate greater focus to the nuanced interpretation of analysis results. This development is likely to significantly advance epidemiological and medical research.


Artificial Intelligence , Biomedical Research , Humans , Data Science , Epidemiologic Studies , Research Design
9.
Artif Intell Med ; 150: 102800, 2024 Apr.
Article En | MEDLINE | ID: mdl-38553146

Image segmentation is one of the vital steps in medical image analysis. A large number of methods based on convolutional neural networks have emerged, which can extract abstract features from multiple-modality medical images, learn valuable information that is difficult to recognize by humans, and obtain more reliable results than traditional image segmentation approaches. U-Net, due to its simple structure and excellent performance, is widely used in medical image segmentation. In this paper, to further improve the performance of U-Net, we propose a channel and space compound attention (CSCA) convolutional neural network, CSCA U-Net in abbreviation, which increases the network depth and employs a double squeeze-and-excitation (DSE) block in the bottleneck layer to enhance feature extraction and obtain more high-level semantic features. Moreover, the characteristics of the proposed method are three-fold: (1) channel and space compound attention (CSCA) block, (2) cross-layer feature fusion (CLFF), and (3) deep supervision (DS). Extensive experiments on several available medical image datasets, including Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETIS, CVC-T, 2018 Data Science Bowl (2018 DSB), ISIC 2018, and JSUAH-Cerebellum, show that CSCA U-Net achieves competitive results and significantly improves generalization performance. The codes and trained models are available at https://github.com/xiaolanshu/CSCA-U-Net.


Data Science , Learning , Humans , Neural Networks, Computer , Semantics , Image Processing, Computer-Assisted
10.
J Am Chem Soc ; 146(12): 8536-8546, 2024 Mar 27.
Article En | MEDLINE | ID: mdl-38480482

Methods to access chiral sulfur(VI) pharmacophores are of interest in medicinal and synthetic chemistry. We report the desymmetrization of unprotected sulfonimidamides via asymmetric acylation with a cinchona-phosphinate catalyst. The desired products are formed in excellent yield and enantioselectivity with no observed bis-acylation. A data-science-driven approach to substrate scope evaluation was coupled to high throughput experimentation (HTE) to facilitate statistical modeling in order to inform mechanistic studies. Reaction kinetics, catalyst structural studies, and density functional theory (DFT) transition state analysis elucidated the turnover-limiting step to be the collapse of the tetrahedral intermediate and provided key insights into the catalyst-substrate structure-activity relationships responsible for the origin of the enantioselectivity. This study offers a reliable method for accessing enantioenriched sulfonimidamides to propel their application as pharmacophores and serves as an example of the mechanistic insight that can be gleaned from integrating data science and traditional physical organic techniques.


Cinchona Alkaloids , Data Science , Molecular Structure , Stereoisomerism , Cinchona Alkaloids/chemistry , Catalysis , Acylation
11.
Brief Bioinform ; 25(2)2024 Jan 22.
Article En | MEDLINE | ID: mdl-38493340

Translational bioinformatics and data science play a crucial role in biomarker discovery as it enables translational research and helps to bridge the gap between the bench research and the bedside clinical applications. Thanks to newer and faster molecular profiling technologies and reducing costs, there are many opportunities for researchers to explore the molecular and physiological mechanisms of diseases. Biomarker discovery enables researchers to better characterize patients, enables early detection and intervention/prevention and predicts treatment responses. Due to increasing prevalence and rising treatment costs, mental health (MH) disorders have become an important venue for biomarker discovery with the goal of improved patient diagnostics, treatment and care. Exploration of underlying biological mechanisms is the key to the understanding of pathogenesis and pathophysiology of MH disorders. In an effort to better understand the underlying mechanisms of MH disorders, we reviewed the major accomplishments in the MH space from a bioinformatics and data science perspective, summarized existing knowledge derived from molecular and cellular data and described challenges and areas of opportunities in this space.


Biomedical Research , Mental Health , Humans , Data Science , Computational Biology , Biomarkers
13.
J Biomed Inform ; 151: 104602, 2024 03.
Article En | MEDLINE | ID: mdl-38346530

OBJECTIVE: An applied problem facing all areas of data science is harmonizing data sources. Joining data from multiple origins with unmapped and only partially overlapping features is a prerequisite to developing and testing robust, generalizable algorithms, especially in healthcare. This integrating is usually resolved using meta-data such as feature names, which may be unavailable or ambiguous. Our goal is to design methods that create a mapping between structured tabular datasets derived from electronic health records independent of meta-data. METHODS: We evaluate methods in the challenging case of numeric features without reliable and distinctive univariate summaries, such as nearly Gaussian and binary features. We assume that a small set of features are a priori mapped between two datasets, which share unknown identical features and possibly many unrelated features. Inter-feature relationships are the main source of identification which we expect. We compare the performance of contrastive learning methods for feature representations, novel partial auto-encoders, mutual-information graph optimizers, and simple statistical baselines on simulated data, public datasets, the MIMIC-III medical-record changeover, and perioperative records from before and after a medical-record system change. Performance was evaluated using both mapping of identical features and reconstruction accuracy of examples in the format of the other dataset. RESULTS: Contrastive learning-based methods overall performed the best, often substantially beating the literature baseline in matching and reconstruction, especially in the more challenging real data experiments. Partial auto-encoder methods showed on-par matching with contrastive methods in all synthetic and some real datasets, along with good reconstruction. However, the statistical method we created performed reasonably well in many cases, with much less dependence on hyperparameter tuning. When validating feature match output in the EHR dataset we found that some mistakes were actually a surrogate or related feature as reviewed by two subject matter experts. CONCLUSION: In simulation studies and real-world examples, we find that inter-feature relationships are effective at identifying matching or closely related features across tabular datasets when meta-data is not available. Decoder architectures are also reasonably effective at imputing features without an exact match.


Algorithms , Electronic Health Records , Computer Simulation , Data Science , Motivation
14.
J Am Med Inform Assoc ; 31(5): 1051-1061, 2024 Apr 19.
Article En | MEDLINE | ID: mdl-38412331

BACKGROUND: Predictive models show promise in healthcare, but their successful deployment is challenging due to limited generalizability. Current external validation often focuses on model performance with restricted feature use from the original training data, lacking insights into their suitability at external sites. Our study introduces an innovative methodology for evaluating features during both the development phase and the validation, focusing on creating and validating predictive models for post-surgery patient outcomes with improved generalizability. METHODS: Electronic health records (EHRs) from 4 countries (United States, United Kingdom, Finland, and Korea) were mapped to the OMOP Common Data Model (CDM), 2008-2019. Machine learning (ML) models were developed to predict post-surgery prolonged opioid use (POU) risks using data collected 6 months before surgery. Both local and cross-site feature selection methods were applied in the development and external validation datasets. Models were developed using Observational Health Data Sciences and Informatics (OHDSI) tools and validated on separate patient cohorts. RESULTS: Model development included 41 929 patients, 14.6% with POU. The external validation included 31 932 (UK), 23 100 (US), 7295 (Korea), and 3934 (Finland) patients with POU of 44.2%, 22.0%, 15.8%, and 21.8%, respectively. The top-performing model, Lasso logistic regression, achieved an area under the receiver operating characteristic curve (AUROC) of 0.75 during local validation and 0.69 (SD = 0.02) (averaged) in external validation. Models trained with cross-site feature selection significantly outperformed those using only features from the development site through external validation (P < .05). CONCLUSIONS: Using EHRs across four countries mapped to the OMOP CDM, we developed generalizable predictive models for POU. Our approach demonstrates the significant impact of cross-site feature selection in improving model performance, underscoring the importance of incorporating diverse feature sets from various clinical settings to enhance the generalizability and utility of predictive healthcare models.


Data Science , Medical Informatics , Humans , Logistic Models , United Kingdom , Finland
15.
Redox Biol ; 70: 103061, 2024 Apr.
Article En | MEDLINE | ID: mdl-38341954

RATIONALE: MER proto-oncogene tyrosine kinase (MerTK) is a key receptor for the clearance of apoptotic cells (efferocytosis) and plays important roles in redox-related human diseases. We will explore MerTK biology in human cells, tissues, and diseases based on big data analytics. METHODS: The human RNA-seq and scRNA-seq data about 42,700 samples were from NCBI Gene Expression Omnibus and analyzed by QIAGEN Ingenuity Pathway Analysis (IPA) with about 170,000 crossover analysis. MerTK expression was quantified as Log2 (FPKM + 0.1). RESULTS: We found that, in human cells, MerTK is highly expressed in macrophages, monocytes, progenitor cells, alpha-beta T cells, plasma B cells, myeloid cells, and endothelial cells (ECs). In human tissues, MerTK has higher expression in plaque, blood vessels, heart, liver, sensory system, artificial tissue, bone, adrenal gland, central nervous system (CNS), and connective tissue. Compared to normal conditions, MerTK expression in related tissues is altered in many human diseases, including cardiovascular diseases, cancer, and brain disorders. Interestingly, MerTK expression also shows sex differences in many tissues, indicating that MerTK may have different impact on male and female. Finally, based on our proteomics from primary human aortic ECs, we validated the functions of MerTK in several human diseases, such as cancer, aging, kidney failure and heart failure. CONCLUSIONS: Our big data analytics suggest that MerTK may be a promising therapeutic target, but how it should be modulated depends on the disease types and sex differences. For example, MerTK inhibition emerges as a new strategy for cancer therapy due to it counteracts effect on anti-tumor immunity, while MerTK restoration represents a promising treatment for atherosclerosis and myocardial infarction as MerTK is cleaved in these disease conditions.


Receptor Protein-Tyrosine Kinases , c-Mer Tyrosine Kinase , Female , Humans , Male , Apoptosis/genetics , c-Mer Tyrosine Kinase/genetics , Data Science , Endothelial Cells/metabolism , Genomics , Neoplasms/metabolism , Phagocytosis , Proto-Oncogene Proteins/genetics , Proto-Oncogene Proteins/metabolism , Receptor Protein-Tyrosine Kinases/genetics , Receptor Protein-Tyrosine Kinases/metabolism , Brain Diseases/metabolism
16.
JMIR Med Educ ; 10: e46740, 2024 Feb 21.
Article En | MEDLINE | ID: mdl-38381477

BACKGROUND: The key to the digital leveling-up strategy of the National Health Service is the development of a digitally proficient leadership. The National Health Service Digital Academy (NHSDA) Digital Health Leadership program was designed to support emerging digital leaders to acquire the necessary skills to facilitate transformation. This study examined the influence of the program on professional identity formation as a means of creating a more proficient digital health leadership. OBJECTIVE: This study aims to examine the impact of the NHSDA program on participants' perceptions of themselves as digital health leaders. METHODS: We recruited 41 participants from 2 cohorts of the 2-year NHSDA program in this mixed methods study, all of whom had completed it >6 months before the study. The participants were initially invited to complete a web-based scoping questionnaire. This involved both quantitative and qualitative responses to prompts. Frequencies of responses were aggregated, while free-text comments from the questionnaire were analyzed inductively. The content of the 30 highest-scoring dissertations was also reviewed by 2 independent authors. A total of 14 semistructured interviews were then conducted with a subset of the cohort. These focused on individuals' perceptions of digital leadership and the influence of the course on the attainment of skills. In total, 3 in-depth focus groups were then conducted with participants to examine shared perceptions of professional identity as digital health leaders. The transcripts from the interviews and focus groups were aligned with a previously published examination of leadership as a framework. RESULTS: Of the 41 participants, 42% (17/41) were in clinical roles, 34% (14/41) were in program delivery or management roles, 20% (8/41) were in data science roles, and 5% (2/41) were in "other" roles. Interviews and focus groups highlighted that the course influenced 8 domains of professional identity: commitment to the profession, critical thinking, goal orientation, mentoring, perception of the profession, socialization, reflection, and self-efficacy. The dissertation of the practice model, in which candidates undertake digital projects within their organizations supported by faculty, largely impacted metacognitive skill acquisition and goal orientation. However, the program also affected participants' values and direction within the wider digital health community. According to the questionnaire, after graduation, 59% (24/41) of the participants changed roles in search of more prominence within digital leadership, with 46% (11/24) reporting that the course was a strong determinant of this change. CONCLUSIONS: A digital leadership course aimed at providing attendees with the necessary attributes to guide transformation can have a significant impact on professional identity formation. This can create a sense of belonging to a wider health leadership structure and facilitate the attainment of organizational and national digital targets. This effect is diminished by a lack of locoregional support for professional development.


Digital Health , State Medicine , Humans , Academies and Institutes , Data Science , Faculty
17.
Bioinformatics ; 40(2)2024 Feb 01.
Article En | MEDLINE | ID: mdl-38402507

MOTIVATION: Genomic intervals are one of the most prevalent data structures in computational genome biology, and used to represent features ranging from genes, to DNA binding sites, to disease variants. Operations on genomic intervals provide a language for asking questions about relationships between features. While there are excellent interval arithmetic tools for the command line, they are not smoothly integrated into Python, one of the most popular general-purpose computational and visualization environments. RESULTS: Bioframe is a library to enable flexible and performant operations on genomic interval dataframes in Python. Bioframe extends the Python data science stack to use cases for computational genome biology by building directly on top of two of the most commonly-used Python libraries, NumPy and Pandas. The bioframe API enables flexible name and column orders, and decouples operations from data formats to avoid unnecessary conversions, a common scourge for bioinformaticians. Bioframe achieves these goals while maintaining high performance and a rich set of features. AVAILABILITY AND IMPLEMENTATION: Bioframe is open-source under MIT license, cross-platform, and can be installed from the Python Package Index. The source code is maintained by Open2C on GitHub at https://github.com/open2c/bioframe.


Computational Biology , Genomics , Gene Library , Binding Sites , Data Science
18.
PLoS One ; 19(2): e0299327, 2024.
Article En | MEDLINE | ID: mdl-38422040

The growing demand for data scientists in both the global and Dutch labour markets has led to an increase in data science and artificial intelligence (AI) master programs offered by universities. However, there is still a lack of clarity regarding the specific skills of data scientists. This study addresses this issue by employing Correlated Topic Modeling (CTM) to analyse the content of 41 master programs offered by 11 Dutch universities and an interuniversity combined program. We assess the differences and similarities in the core skills taught by these programs, determine the subject-specific and general nature of the skills, and provide a comparison between the different types of universities offering these programs. Our analysis reveals that data processing, statistics, research, and ethics are the core competencies in Dutch data science and AI master programs. General universities tend to focus on research skills, while technical universities lean more towards IT and electronics skills. Broad-focussed data science and AI programs generally concentrate on data processing, information technology, electronics, and research, while subject-specific programs give priority to statistics and ethics. This research enhances the understanding of the diverse skills of Dutch data science graduates, providing valuable insights for employers, academic institutions, and prospective students.


Artificial Intelligence , Data Science , Humans , Universities , Schools , Data Mining
19.
Epigenomics ; 16(5): 273-276, 2024 Mar.
Article En | MEDLINE | ID: mdl-38312014

Tweetable abstract This article reviews machine learning models that leverages epigenomic data for predicting multifactorial diseases and symptoms as well as how such models can be utilized to explore new research questions.


DNA Methylation , Epigenesis, Genetic , Humans , Epigenome , Data Science , Epigenomics
20.
Comput Biol Med ; 171: 108124, 2024 Mar.
Article En | MEDLINE | ID: mdl-38412691

BACKGROUND: Aldosterone plays a key role in the neurohormonal drive of heart failure. Systematic prioritization of drug targets using bioinformatics and database-driven decision-making can provide a competitive advantage in therapeutic R&D. This study investigated the evidence on the druggability of these aldosterone targets in heart failure. METHODS: The target disease predictability of mineralocorticoid receptors (MR) and aldosterone synthase (AS) in cardiac failure was evaluated using Open Targets target-disease association scores. The Open Targets database collections were downloaded to MongoDB and queried according to the desired aggregation level, and the results were retrieved from the Europe PMC (data type: text mining), ChEMBL (data type: drugs), Open Targets Genetics Portal (data type: genetic associations), and IMPC (data type: genetic associations) databases. The target tractability of MR and AS in the cardiovascular system was investigated by computing activity scores in a curated ChEMBL database using supervised machine learning. RESULTS: The medians of the association scores of the MR and AS groups were similar, indicating a comparable predictability of the target disease. The median of the MR activity scores group was significantly lower than that of AS, indicating that AS has higher target tractability than MR [Hodges-Lehmann difference 0.62 (95%CI 0.53-0.70, p < 0.0001]. The cumulative distributions of the overall multiplatform association scores of cardiac diseases with MR were considerably higher than with AS, indicating more advanced investigations on a wider range of disorders evaluated for MR (Kolmogorov-Smirnov D = 0.36, p = 0.0009). In curated ChEMBL, MR had a higher cumulative distribution of activity scores in experimental cardiovascular assays than AS (Kolmogorov-Smirnov D = 0.23, p < 0.0001). Documented clinical trials for MR in heart failures surfaced in database searches, none for AS. CONCLUSIONS: Although its clinical development has lagged behind that of MR, our findings indicate that AS is a promising therapeutic target for the treatment of cardiac failure. The multiplatform-integrated identification used in this study allowed us to comprehensively explore the available scientific evidence on MR and AS for heart failure therapy.


Aldosterone , Heart Failure , Humans , Data Science , Heart Failure/drug therapy , Heart , Enzyme Inhibitors , Cardiotonic Agents , Computational Biology
...