Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 937
Filtrar
1.
AMA J Ethics ; 26(4): E306-314, 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38564745

RESUMO

Drug shortages are a persistent and serious problem in the United States, affecting patient care and health care costs. This article canvasses factors that contribute to drug shortages, such as manufacturing complexity, price, and quality inspection records. This article further proposes an early warning system and payment, contracting, and pricing innovations to mitigate drug shortages and offers data-driven recommendations to stakeholders looking to protect the supply of quality medicines.


Assuntos
Ciência de Dados , Indústria Farmacêutica , Humanos , Estados Unidos , Custos de Cuidados de Saúde
3.
J Am Chem Soc ; 146(12): 8536-8546, 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38480482

RESUMO

Methods to access chiral sulfur(VI) pharmacophores are of interest in medicinal and synthetic chemistry. We report the desymmetrization of unprotected sulfonimidamides via asymmetric acylation with a cinchona-phosphinate catalyst. The desired products are formed in excellent yield and enantioselectivity with no observed bis-acylation. A data-science-driven approach to substrate scope evaluation was coupled to high throughput experimentation (HTE) to facilitate statistical modeling in order to inform mechanistic studies. Reaction kinetics, catalyst structural studies, and density functional theory (DFT) transition state analysis elucidated the turnover-limiting step to be the collapse of the tetrahedral intermediate and provided key insights into the catalyst-substrate structure-activity relationships responsible for the origin of the enantioselectivity. This study offers a reliable method for accessing enantioenriched sulfonimidamides to propel their application as pharmacophores and serves as an example of the mechanistic insight that can be gleaned from integrating data science and traditional physical organic techniques.


Assuntos
Alcaloides de Cinchona , Ciência de Dados , Estrutura Molecular , Estereoisomerismo , Alcaloides de Cinchona/química , Catálise , Acilação
4.
J Glob Health ; 14: 04070, 2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38547497

RESUMO

Background: OpenAI's Chat Generative Pre-trained Transformer 4.0 (ChatGPT-4), an emerging artificial intelligence (AI)-based large language model (LLM), has been receiving increasing attention from the medical research community for its innovative 'Data Analyst' feature. We aimed to compare the capabilities of ChatGPT-4 against traditional biostatistical software (i.e. SAS, SPSS, R) in statistically analysing epidemiological research data. Methods: We used a data set from the China Health and Nutrition Survey, comprising 9317 participants and 29 variables (e.g. gender, age, educational level, marital status, income, occupation, weekly working hours, survival status). Two researchers independently evaluated the data analysis capabilities of GPT-4's 'Data Analyst' feature against SAS, SPSS, and R across three commonly used epidemiological analysis methods: Descriptive statistics, intergroup analysis, and correlation analysis. We used an internally developed evaluation scale to assess and compare the consistency of results, analytical efficiency of coding or operations, user-friendliness, and overall performance between ChatGPT-4, SAS, SPSS, and R. Results: In descriptive statistics, ChatGPT-4 showed high consistency of results, greater analytical efficiency of code or operations, and more intuitive user-friendliness compared to SAS, SPSS, and R. In intergroup comparisons and correlational analyses, despite minor discrepancies in statistical outcomes for certain analysis tasks with SAS, SPSS, and R, ChatGPT-4 maintained high analytical efficiency and exceptional user-friendliness. Thus, employing ChatGPT-4 can significantly lower the operational threshold for conducting epidemiological data analysis while maintaining consistency with traditional biostatistical software's outcome, requiring only specific, clear analysis instructions without any additional operations or code writing. Conclusions: We found ChatGPT-4 to be a powerful auxiliary tool for statistical analysis in epidemiological research. However, it showed limitations in result consistency and in applying more advanced statistical methods. Therefore, we advocate for the use of ChatGPT-4 in supporting researchers with intermediate experience in data analysis. With AI technologies like LLMs advancing rapidly, their integration with data analysis platforms promises to lower operational barriers, thereby enabling researchers to dedicate greater focus to the nuanced interpretation of analysis results. This development is likely to significantly advance epidemiological and medical research.


Assuntos
Inteligência Artificial , Pesquisa Biomédica , Humanos , Ciência de Dados , Estudos Epidemiológicos , Projetos de Pesquisa
5.
Artif Intell Med ; 150: 102800, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38553146

RESUMO

Image segmentation is one of the vital steps in medical image analysis. A large number of methods based on convolutional neural networks have emerged, which can extract abstract features from multiple-modality medical images, learn valuable information that is difficult to recognize by humans, and obtain more reliable results than traditional image segmentation approaches. U-Net, due to its simple structure and excellent performance, is widely used in medical image segmentation. In this paper, to further improve the performance of U-Net, we propose a channel and space compound attention (CSCA) convolutional neural network, CSCA U-Net in abbreviation, which increases the network depth and employs a double squeeze-and-excitation (DSE) block in the bottleneck layer to enhance feature extraction and obtain more high-level semantic features. Moreover, the characteristics of the proposed method are three-fold: (1) channel and space compound attention (CSCA) block, (2) cross-layer feature fusion (CLFF), and (3) deep supervision (DS). Extensive experiments on several available medical image datasets, including Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETIS, CVC-T, 2018 Data Science Bowl (2018 DSB), ISIC 2018, and JSUAH-Cerebellum, show that CSCA U-Net achieves competitive results and significantly improves generalization performance. The codes and trained models are available at https://github.com/xiaolanshu/CSCA-U-Net.


Assuntos
Ciência de Dados , Aprendizagem , Humanos , Redes Neurais de Computação , Semântica , Processamento de Imagem Assistida por Computador
7.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38493340

RESUMO

Translational bioinformatics and data science play a crucial role in biomarker discovery as it enables translational research and helps to bridge the gap between the bench research and the bedside clinical applications. Thanks to newer and faster molecular profiling technologies and reducing costs, there are many opportunities for researchers to explore the molecular and physiological mechanisms of diseases. Biomarker discovery enables researchers to better characterize patients, enables early detection and intervention/prevention and predicts treatment responses. Due to increasing prevalence and rising treatment costs, mental health (MH) disorders have become an important venue for biomarker discovery with the goal of improved patient diagnostics, treatment and care. Exploration of underlying biological mechanisms is the key to the understanding of pathogenesis and pathophysiology of MH disorders. In an effort to better understand the underlying mechanisms of MH disorders, we reviewed the major accomplishments in the MH space from a bioinformatics and data science perspective, summarized existing knowledge derived from molecular and cellular data and described challenges and areas of opportunities in this space.


Assuntos
Pesquisa Biomédica , Saúde Mental , Humanos , Ciência de Dados , Biologia Computacional , Biomarcadores
8.
BMC Palliat Care ; 23(1): 62, 2024 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-38429698

RESUMO

BACKGROUND: Breakthrough cancer pain (BTCP) is primarily managed at home and can stem from physical exertion and emotional distress triggers. Beyond these triggers, the impact of ambient environment on pain occurrence and intensity has not been investigated. This study explores the impact of environmental factors on the frequency and severity of breakthrough cancer pain (BTCP) in the home context from the perspective of patients with advanced cancer and their primary family caregiver. METHODS: A health monitoring system was deployed in the homes of patient and family caregiver dyads to collect self-reported pain events and contextual environmental data (light, temperature, humidity, barometric pressure, ambient noise.) Correlation analysis examined the relationship between environmental factors with: 1) individually reported pain episodes and 2) overall pain trends in a 24-hour time window. Machine learning models were developed to explore how environmental factors may predict BTCP episodes. RESULTS: Variability in correlation strength between environmental variables and pain reports among dyads was found. Light and noise show moderate association (r = 0.50-0.70) in 66% of total deployments. The strongest correlation for individual pain events involved barometric pressure (r = 0.90); for pain trends over 24-hours the strongest correlations involved humidity (r = 0.84) and barometric pressure (r = 0.83). Machine learning achieved 70% BTCP prediction accuracy. CONCLUSION: Our study provides insights into the role of ambient environmental factors in BTCP and offers novel opportunities to inform personalized pain management strategies, remotely support patients and their caregivers in self-symptom management. This research provides preliminary evidence of the impact of ambient environmental factors on BTCP in the home setting. We utilized real-world data and correlation analysis to provide an understanding of the relationship between environmental factors and cancer pain which may be helpful to others engaged in similar work.


Assuntos
Dor Irruptiva , Dor do Câncer , Neoplasias , Humanos , Analgésicos Opioides , Ciência de Dados , Manejo da Dor , Neoplasias/complicações
9.
BMC Med Inform Decis Mak ; 24(1): 58, 2024 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-38408983

RESUMO

BACKGROUND: To gain insight into the real-life care of patients in the healthcare system, data from hospital information systems and insurance systems are required. Consequently, linking clinical data with claims data is necessary. To ensure their syntactic and semantic interoperability, the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) from the Observational Health Data Sciences and Informatics (OHDSI) community was chosen. However, there is no detailed guide that would allow researchers to follow a generic process for data harmonization, i.e. the transformation of local source data into the standardized OMOP CDM format. Thus, the aim of this paper is to conceptualize a generic data harmonization process for OMOP CDM. METHODS: For this purpose, we conducted a literature review focusing on publications that address the harmonization of clinical or claims data in OMOP CDM. Subsequently, the process steps used and their chronological order as well as applied OHDSI tools were extracted for each included publication. The results were then compared to derive a generic sequence of the process steps. RESULTS: From 23 publications included, a generic data harmonization process for OMOP CDM was conceptualized, consisting of nine process steps: dataset specification, data profiling, vocabulary identification, coverage analysis of vocabularies, semantic mapping, structural mapping, extract-transform-load-process, qualitative and quantitative data quality analysis. Furthermore, we identified seven OHDSI tools which supported five of the process steps. CONCLUSIONS: The generic data harmonization process can be used as a step-by-step guide to assist other researchers in harmonizing source data in OMOP CDM.


Assuntos
Informática Médica , Vocabulário , Humanos , Bases de Dados Factuais , Ciência de Dados , Semântica , Registros Eletrônicos de Saúde
10.
PLoS One ; 19(2): e0294307, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38412191

RESUMO

OBJECTIVE: The unprecedented events of 2020 required a pivot in scientific training to better prepare the biomedical research workforce to address global pandemics, structural racism, and social inequities that devastate human health individually and erode it collectively. Furthermore, this pivot had to be accomplished in the virtual environment given the nation-wide lockdown. METHODS: These needs and context led to leveraging of the San Francisco Building Infrastructure Leading to Diversity (SF BUILD) theories of change to innovate a Virtual BUILD Research Collaboratory (VBRC). The purpose of VBRC was to train Black, Indigenous, and people of color (BIPOC) students to apply their unique perspectives to biomedical research. These training activities were evaluated using a pre-post survey design that included both validated and new psychosocial scales. A new scale was piloted to measure culturally relevant pedagogy. RESULTS: VBRC scholars increased science identity on two items: thinking of myself as a scientist (+1point, p = 0.006) and belonging to a community of scientists (+1point, p = 0.069). Overall, scholars perceived stress also decreased over VBRC (-2.35 points, p = 0.02). Post VBRC, scholars had high agency scores (µ = 11.02, Md = 12, range = 6-12, σ = 1.62) and cultural humility scores (µ = 22.11, Md = 23, range = 12-24, σ = 2.71). No notable race/ethnic differences were found in any measures. CONCLUSIONS: Taken together, our innovative approach to data science training for BIPOC in unprecedented times shows promise for better preparing the workforce critically needed to address the fundamental gaps in knowledge at the intersection of public health, structural racism, and biomedical sciences.


Assuntos
Pesquisa Biomédica , Racismo , Humanos , Racismo/prevenção & controle , Ciência de Dados , Recursos Humanos , Pesquisa Biomédica/educação , Estudantes
11.
Bioinformatics ; 40(2)2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38402507

RESUMO

MOTIVATION: Genomic intervals are one of the most prevalent data structures in computational genome biology, and used to represent features ranging from genes, to DNA binding sites, to disease variants. Operations on genomic intervals provide a language for asking questions about relationships between features. While there are excellent interval arithmetic tools for the command line, they are not smoothly integrated into Python, one of the most popular general-purpose computational and visualization environments. RESULTS: Bioframe is a library to enable flexible and performant operations on genomic interval dataframes in Python. Bioframe extends the Python data science stack to use cases for computational genome biology by building directly on top of two of the most commonly-used Python libraries, NumPy and Pandas. The bioframe API enables flexible name and column orders, and decouples operations from data formats to avoid unnecessary conversions, a common scourge for bioinformaticians. Bioframe achieves these goals while maintaining high performance and a rich set of features. AVAILABILITY AND IMPLEMENTATION: Bioframe is open-source under MIT license, cross-platform, and can be installed from the Python Package Index. The source code is maintained by Open2C on GitHub at https://github.com/open2c/bioframe.


Assuntos
Biologia Computacional , Genômica , Biblioteca Gênica , Sítios de Ligação , Ciência de Dados
12.
PLoS One ; 19(2): e0299327, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38422040

RESUMO

The growing demand for data scientists in both the global and Dutch labour markets has led to an increase in data science and artificial intelligence (AI) master programs offered by universities. However, there is still a lack of clarity regarding the specific skills of data scientists. This study addresses this issue by employing Correlated Topic Modeling (CTM) to analyse the content of 41 master programs offered by 11 Dutch universities and an interuniversity combined program. We assess the differences and similarities in the core skills taught by these programs, determine the subject-specific and general nature of the skills, and provide a comparison between the different types of universities offering these programs. Our analysis reveals that data processing, statistics, research, and ethics are the core competencies in Dutch data science and AI master programs. General universities tend to focus on research skills, while technical universities lean more towards IT and electronics skills. Broad-focussed data science and AI programs generally concentrate on data processing, information technology, electronics, and research, while subject-specific programs give priority to statistics and ethics. This research enhances the understanding of the diverse skills of Dutch data science graduates, providing valuable insights for employers, academic institutions, and prospective students.


Assuntos
Inteligência Artificial , Ciência de Dados , Humanos , Universidades , Instituições Acadêmicas , Mineração de Dados
13.
Epigenomics ; 16(5): 273-276, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38312014

RESUMO

Tweetable abstract This article reviews machine learning models that leverages epigenomic data for predicting multifactorial diseases and symptoms as well as how such models can be utilized to explore new research questions.


Assuntos
Metilação de DNA , Epigênese Genética , Humanos , Epigenoma , Ciência de Dados , Epigenômica
14.
Trials ; 25(1): 114, 2024 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-38336793

RESUMO

BACKGROUND: Economic incentives can improve clinical outcomes among in-care people living with HIV (PLHIV), but evidence is limited for their effectiveness among out-of-care PLHIV or those at risk of disengagement. We propose a type 1 hybrid effectiveness-implementation study to advance global knowledge about the use of economic incentives to strengthen the continuity of HIV care and accelerate global goals for HIV epidemic control. METHODS: The Rudi Kundini, Pamoja Kundini study will evaluate two implementation models of an economic incentive strategy for supporting two groups of PLHIV in Tanzania. Phase 1 of the study consists of a two-arm, cluster randomized trial across 32 health facilities to assess the effectiveness of a home visit plus one-time economic incentive on the proportion of out-of-care PLHIV with viral load suppression (< 1000 copies/ml) 6 months after enrollment (n = 640). Phase 2 is an individual 1:1 randomized controlled trial designed to determine the effectiveness of a short-term counseling and economic incentive program offered to in-care PLHIV who are predicted through machine learning to be at risk of disengaging from care on the outcome of viral load suppression at 12 months (n = 692). The program includes up to three incentives conditional upon visit attendance coupled with adapted counselling sessions for this population of PLHIV. Consistent with a hybrid effectiveness-implementation study design, phase 3 is a mixed methods evaluation to explore barriers and facilitators to strategy implementation in phases 1 and 2. Results will be used to guide optimization and scale-up of the incentive strategies, if effective, to the larger population of Tanzanian PLHIV who struggle with continuity of HIV care. DISCUSSION: Innovative strategies that recognize the dynamic process of lifelong retention in HIV care are urgently needed. Strategies such as conditional economic incentives are a simple and effective method for improving many health outcomes, including those on the HIV continuum. If coupled with other supportive services such as home visits (phase 1) or with tailored counselling (phase 2), economic incentives have the potential to strengthen engagement among the subpopulation of PLHIV who struggle with retention in care and could help to close the gap towards reaching global "95-95-95" goals for ending the AIDS epidemic. TRIAL REGISTRATION: Phase 1: ClinicalTrials.gov, NCT05248100 , registered 2/21/2022. Phase 2: ClinicalTrials.gov, NCT05373095 , registered 5/13/2022.


Assuntos
Infecções por HIV , Motivação , Humanos , Tanzânia/epidemiologia , Ciência de Dados , Infecções por HIV/diagnóstico , Infecções por HIV/epidemiologia , Infecções por HIV/terapia , Continuidade da Assistência ao Paciente , Ensaios Clínicos Controlados Aleatórios como Assunto , Ensaios Clínicos Fase II como Assunto
15.
PLoS One ; 19(2): e0298036, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38358964

RESUMO

BACKGROUND: Traditional risk assessment tools often lack accuracy when predicting the short- and long-term mortality following a non-ST-segment elevation myocardial infarction (NSTEMI) or Unstable Angina (UA) in specific population. OBJECTIVE: To employ machine learning (ML) and stacked ensemble learning (EL) methods in predicting short- and long-term mortality in Asian patients diagnosed with NSTEMI/UA and to identify the associated features, subsequently evaluating these findings against established risk scores. METHODS: We analyzed data from the National Cardiovascular Disease Database for Malaysia (2006-2019), representing a diverse NSTEMI/UA Asian cohort. Algorithm development utilized in-hospital records of 9,518 patients, 30-day data from 7,133 patients, and 1-year data from 7,031 patients. This study utilized 39 features, including demographic, cardiovascular risk, medication, and clinical features. In the development of the stacked EL model, four base learner algorithms were employed: eXtreme Gradient Boosting (XGB), Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest (RF), with the Generalized Linear Model (GLM) serving as the meta learner. Significant features were chosen and ranked using ML feature importance with backward elimination. The predictive performance of the algorithms was assessed using the area under the curve (AUC) as a metric. Validation of the algorithms was conducted against the TIMI for NSTEMI/UA using a separate validation dataset, and the net reclassification index (NRI) was subsequently determined. RESULTS: Using both complete and reduced features, the algorithm performance achieved an AUC ranging from 0.73 to 0.89. The top-performing ML algorithm consistently surpassed the TIMI risk score for in-hospital, 30-day, and 1-year predictions (with AUC values of 0.88, 0.88, and 0.81, respectively, all p < 0.001), while the TIMI scores registered significantly lower at 0.55, 0.54, and 0.61. This suggests the TIMI score tends to underestimate patient mortality risk. The net reclassification index (NRI) of the best ML algorithm for NSTEMI/UA patients across these periods yielded an NRI between 40-60% (p < 0.001) relative to the TIMI NSTEMI/UA risk score. Key features identified for both short- and long-term mortality included age, Killip class, heart rate, and Low-Molecular-Weight Heparin (LMWH) administration. CONCLUSIONS: In a broad multi-ethnic population, ML approaches outperformed conventional TIMI scoring in classifying patients with NSTEMI and UA. ML allows for the precise identification of unique characteristics within individual Asian populations, improving the accuracy of mortality predictions. Continuous development, testing, and validation of these ML algorithms holds the promise of enhanced risk stratification, thereby revolutionizing future management strategies and patient outcomes.


Assuntos
Infarto do Miocárdio sem Supradesnível do Segmento ST , Infarto do Miocárdio com Supradesnível do Segmento ST , Humanos , Infarto do Miocárdio sem Supradesnível do Segmento ST/diagnóstico , Heparina de Baixo Peso Molecular , Ciência de Dados , Teorema de Bayes , Angina Instável , Medição de Risco , Arritmias Cardíacas
16.
Prog Mol Biol Transl Sci ; 203: 83-97, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38360007

RESUMO

Nowadays, information technology (IT) has been holding a significant role in daily life worldwide. The trajectory of data science and bioinformatics promises pioneering personalized therapies, reshaping medical landscapes and patient care. For RNA therapy to reach more patients, a comprehensive understanding of the application of data science and bioinformatics to this therapy is essential. Thus, this chapter has summarized the application of data science and bioinformatics in RNA therapeutics. Data science applications in RNA therapy, such as data integration and analytics, machine learning, and drug development, have been discussed. In addition, aspects of bioinformatics such as RNA design and evaluation, drug delivery system simulation, and databases for personalized medicine have also been covered in this chapter. These insights have shed light on existing evidence and opened potential future directions. From there, scientists can elevate RNA-based therapeutics into an era of tailored treatments and revolutionary healthcare.


Assuntos
Biologia Computacional , Ciência de Dados , Humanos , Medicina de Precisão
17.
Comput Biol Med ; 171: 108124, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38412691

RESUMO

BACKGROUND: Aldosterone plays a key role in the neurohormonal drive of heart failure. Systematic prioritization of drug targets using bioinformatics and database-driven decision-making can provide a competitive advantage in therapeutic R&D. This study investigated the evidence on the druggability of these aldosterone targets in heart failure. METHODS: The target disease predictability of mineralocorticoid receptors (MR) and aldosterone synthase (AS) in cardiac failure was evaluated using Open Targets target-disease association scores. The Open Targets database collections were downloaded to MongoDB and queried according to the desired aggregation level, and the results were retrieved from the Europe PMC (data type: text mining), ChEMBL (data type: drugs), Open Targets Genetics Portal (data type: genetic associations), and IMPC (data type: genetic associations) databases. The target tractability of MR and AS in the cardiovascular system was investigated by computing activity scores in a curated ChEMBL database using supervised machine learning. RESULTS: The medians of the association scores of the MR and AS groups were similar, indicating a comparable predictability of the target disease. The median of the MR activity scores group was significantly lower than that of AS, indicating that AS has higher target tractability than MR [Hodges-Lehmann difference 0.62 (95%CI 0.53-0.70, p < 0.0001]. The cumulative distributions of the overall multiplatform association scores of cardiac diseases with MR were considerably higher than with AS, indicating more advanced investigations on a wider range of disorders evaluated for MR (Kolmogorov-Smirnov D = 0.36, p = 0.0009). In curated ChEMBL, MR had a higher cumulative distribution of activity scores in experimental cardiovascular assays than AS (Kolmogorov-Smirnov D = 0.23, p < 0.0001). Documented clinical trials for MR in heart failures surfaced in database searches, none for AS. CONCLUSIONS: Although its clinical development has lagged behind that of MR, our findings indicate that AS is a promising therapeutic target for the treatment of cardiac failure. The multiplatform-integrated identification used in this study allowed us to comprehensively explore the available scientific evidence on MR and AS for heart failure therapy.


Assuntos
Aldosterona , Insuficiência Cardíaca , Humanos , Ciência de Dados , Insuficiência Cardíaca/tratamento farmacológico , Coração , Inibidores Enzimáticos , Cardiotônicos , Biologia Computacional
18.
Neuron ; 112(5): 698-717, 2024 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-38340718

RESUMO

Large language models (LLMs) are a new asset class in the machine-learning landscape. Here we offer a primer on defining properties of these modeling techniques. We then reflect on new modes of investigation in which LLMs can be used to reframe classic neuroscience questions to deliver fresh answers. We reason that LLMs have the potential to (1) enrich neuroscience datasets by adding valuable meta-information, such as advanced text sentiment, (2) summarize vast information sources to overcome divides between siloed neuroscience communities, (3) enable previously unthinkable fusion of disparate information sources relevant to the brain, (4) help deconvolve which cognitive concepts most usefully grasp phenomena in the brain, and much more.


Assuntos
Ciência de Dados , Neurociências , Encéfalo , Idioma , Aprendizado de Máquina
19.
J Biomed Inform ; 151: 104602, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38346530

RESUMO

OBJECTIVE: An applied problem facing all areas of data science is harmonizing data sources. Joining data from multiple origins with unmapped and only partially overlapping features is a prerequisite to developing and testing robust, generalizable algorithms, especially in healthcare. This integrating is usually resolved using meta-data such as feature names, which may be unavailable or ambiguous. Our goal is to design methods that create a mapping between structured tabular datasets derived from electronic health records independent of meta-data. METHODS: We evaluate methods in the challenging case of numeric features without reliable and distinctive univariate summaries, such as nearly Gaussian and binary features. We assume that a small set of features are a priori mapped between two datasets, which share unknown identical features and possibly many unrelated features. Inter-feature relationships are the main source of identification which we expect. We compare the performance of contrastive learning methods for feature representations, novel partial auto-encoders, mutual-information graph optimizers, and simple statistical baselines on simulated data, public datasets, the MIMIC-III medical-record changeover, and perioperative records from before and after a medical-record system change. Performance was evaluated using both mapping of identical features and reconstruction accuracy of examples in the format of the other dataset. RESULTS: Contrastive learning-based methods overall performed the best, often substantially beating the literature baseline in matching and reconstruction, especially in the more challenging real data experiments. Partial auto-encoder methods showed on-par matching with contrastive methods in all synthetic and some real datasets, along with good reconstruction. However, the statistical method we created performed reasonably well in many cases, with much less dependence on hyperparameter tuning. When validating feature match output in the EHR dataset we found that some mistakes were actually a surrogate or related feature as reviewed by two subject matter experts. CONCLUSION: In simulation studies and real-world examples, we find that inter-feature relationships are effective at identifying matching or closely related features across tabular datasets when meta-data is not available. Decoder architectures are also reasonably effective at imputing features without an exact match.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Simulação por Computador , Ciência de Dados , Motivação
20.
Redox Biol ; 70: 103061, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38341954

RESUMO

RATIONALE: MER proto-oncogene tyrosine kinase (MerTK) is a key receptor for the clearance of apoptotic cells (efferocytosis) and plays important roles in redox-related human diseases. We will explore MerTK biology in human cells, tissues, and diseases based on big data analytics. METHODS: The human RNA-seq and scRNA-seq data about 42,700 samples were from NCBI Gene Expression Omnibus and analyzed by QIAGEN Ingenuity Pathway Analysis (IPA) with about 170,000 crossover analysis. MerTK expression was quantified as Log2 (FPKM + 0.1). RESULTS: We found that, in human cells, MerTK is highly expressed in macrophages, monocytes, progenitor cells, alpha-beta T cells, plasma B cells, myeloid cells, and endothelial cells (ECs). In human tissues, MerTK has higher expression in plaque, blood vessels, heart, liver, sensory system, artificial tissue, bone, adrenal gland, central nervous system (CNS), and connective tissue. Compared to normal conditions, MerTK expression in related tissues is altered in many human diseases, including cardiovascular diseases, cancer, and brain disorders. Interestingly, MerTK expression also shows sex differences in many tissues, indicating that MerTK may have different impact on male and female. Finally, based on our proteomics from primary human aortic ECs, we validated the functions of MerTK in several human diseases, such as cancer, aging, kidney failure and heart failure. CONCLUSIONS: Our big data analytics suggest that MerTK may be a promising therapeutic target, but how it should be modulated depends on the disease types and sex differences. For example, MerTK inhibition emerges as a new strategy for cancer therapy due to it counteracts effect on anti-tumor immunity, while MerTK restoration represents a promising treatment for atherosclerosis and myocardial infarction as MerTK is cleaved in these disease conditions.


Assuntos
Receptores Proteína Tirosina Quinases , c-Mer Tirosina Quinase , Feminino , Humanos , Masculino , Apoptose/genética , c-Mer Tirosina Quinase/genética , Ciência de Dados , Células Endoteliais/metabolismo , Genômica , Neoplasias/metabolismo , Fagocitose , Proteínas Proto-Oncogênicas/genética , Proteínas Proto-Oncogênicas/metabolismo , Receptores Proteína Tirosina Quinases/genética , Receptores Proteína Tirosina Quinases/metabolismo , Encefalopatias/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...