Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 312
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nat Rev Genet ; 21(10): 615-629, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32694666

RESUMO

Data sharing anchors reproducible science, but expectations and best practices are often nebulous. Communities of funders, researchers and publishers continue to grapple with what should be required or encouraged. To illuminate the rationales for sharing data, the technical challenges and the social and cultural challenges, we consider the stakeholders in the scientific enterprise. In biomedical research, participants are key among those stakeholders. Ethical sharing requires considering both the value of research efforts and the privacy costs for participants. We discuss current best practices for various types of genomic data, as well as opportunities to promote ethical data sharing that accelerates science by aligning incentives.


Assuntos
Pesquisa Biomédica/métodos , Pesquisa Biomédica/tendências , Genômica/ética , Disseminação de Informação/ética , Pesquisadores/tendências , Comportamento Cooperativo , Humanos , Privacidade
2.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36752347

RESUMO

Alzheimer's disease (AD) is one of the most challenging neurodegenerative diseases because of its complicated and progressive mechanisms, and multiple risk factors. Increasing research evidence demonstrates that genetics may be a key factor responsible for the occurrence of the disease. Although previous reports identified quite a few AD-associated genes, they were mostly limited owing to patient sample size and selection bias. There is a lack of comprehensive research aimed to identify AD-associated risk mutations systematically. To address this challenge, we hereby construct a large-scale AD mutation and co-mutation framework ('AD-Syn-Net'), and propose deep learning models named Deep-SMCI and Deep-CMCI configured with fully connected layers that are capable of predicting cognitive impairment of subjects effectively based on genetic mutation and co-mutation profiles. Next, we apply the customized frameworks to data sets to evaluate the importance scores of the mutations and identified mutation effectors and co-mutation combination vulnerabilities contributing to cognitive impairment. Furthermore, we evaluate the influence of mutation pairs on the network architecture to dissect the genetic organization of AD and identify novel co-mutations that could be responsible for dementia, laying a solid foundation for proposing future targeted therapy for AD precision medicine. Our deep learning model codes are available open access here: https://github.com/Pan-Bio/AD-mutation-effectors.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Aprendizado Profundo , Humanos , Doença de Alzheimer/genética , Imageamento por Ressonância Magnética , Disfunção Cognitiva/genética , Mutação
3.
Bioinformatics ; 2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-39115390

RESUMO

SUMMARY: The vast generation of genetic data poses a significant challenge in efficiently uncovering valuable knowledge. Introducing GENEVIC, an AI-driven chat framework that tackles this challenge by bridging the gap between genetic data generation and biomedical knowledge discovery. Leveraging generative AI, notably ChatGPT, it serves as a biologist's 'copilot'. It automates the analysis, retrieval, and visualization of customized domain-specific genetic information, and integrates functionalities to generate protein interaction networks, enrich gene sets, and search scientific literature from PubMed, Google Scholar, and arXiv, making it a comprehensive tool for biomedical research. In its pilot phase, GENEVIC is assessed using a curated database that ranks genetic variants associated with Alzheimer's disease, schizophrenia, and cognition, based on their effect weights from the Polygenic Score (PGS) Catalog, thus enabling researchers to prioritize genetic variants in complex diseases. GENEVIC's operation is user-friendly, accessible without any specialized training, secured by Azure OpenAI's HIPAA-compliant infrastructure, and evaluated for its efficacy through real-time query testing. As a prototype, GENEVIC is set to advance genetic research, enabling informed biomedical decisions. AVAILABILITY AND IMPLEMENTATION: GENEVIC is publicly accessible at https://genevic- anath2024.streamlit.app. The underlying code is open-source and available via GitHub at https://github.com/bsml320/GENEVIC.git (also at https://github.com/anath2110/GENEVIC.git). SUPPLEMENTARY INFORMATION: Available at Bioinformatics online and at https://github.com/bsml320/GENEVIC_Supplementary.git (also at https://github.com/anath2110/GENEVIC_Supplementary.git).

4.
PLoS Comput Biol ; 20(7): e1012142, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39047024

RESUMO

Increasing genetic and phenotypic data size is critical for understanding the genetic determinants of diseases. Evidently, establishing practical means for collaboration and data sharing among institutions is a fundamental methodological barrier for performing high-powered studies. As the sample sizes become more heterogeneous, complex statistical approaches, such as generalized linear mixed effects models, must be used to correct for the confounders that may bias results. On another front, due to the privacy concerns around Protected Health Information (PHI), genetic information is restrictively protected by sharing according to regulations such as Health Insurance Portability and Accountability Act (HIPAA). This limits data sharing among institutions and hampers efforts around executing high-powered collaborative studies. Federated approaches are promising to alleviate the issues around privacy and performance, since sensitive data never leaves the local sites. Motivated by these, we developed FedGMMAT, a federated genetic association testing tool that utilizes a federated statistical testing approach for efficient association tests that can correct for confounding fixed and additive polygenic random effects among different collaborating sites. Genetic data is never shared among collaborating sites, and the intermediate statistics are protected by encryption. Using simulated and real datasets, we demonstrate FedGMMAT can achieve the virtually same results as pooled analysis under a privacy-preserving framework with practical resource requirements.


Assuntos
Disseminação de Informação , Humanos , Modelos Lineares , Disseminação de Informação/métodos , Biologia Computacional/métodos , Software , Estudo de Associação Genômica Ampla/métodos , Estudos de Associação Genética
5.
BMC Bioinformatics ; 25(1): 250, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39080535

RESUMO

BACKGROUND: The potential benefits of drug combination synergy in cancer medicine are significant, yet the risks must be carefully managed due to the possibility of increased toxicity. Although artificial intelligence applications have demonstrated notable success in predicting drug combination synergy, several key challenges persist: (1) Existing models often predict average synergy values across a restricted range of testing dosages, neglecting crucial dose amounts and the mechanisms of action of the drugs involved. (2) Many graph-based models rely on static protein-protein interactions, failing to adapt to dynamic and higher-order relationships. These limitations constrain the applicability of current methods. RESULTS: We introduce SAFER, a Sub-hypergraph Attention-based graph model, addressing these issues by incorporating complex relationships among biological knowledge networks and considering dosing effects on subject-specific networks. SAFER outperformed previous models on the benchmark and the independent test set. The analysis of subgraph attention weight for the lung cancer cell line highlighted JAK-STAT signaling pathway, PRDM12, ZNF781, and CDC5L that have been implicated in lung fibrosis. CONCLUSIONS: SAFER presents an interpretable framework designed to identify drug-responsive signals. Tailored for comprehending dose effects on subject-specific molecular contexts, our model uniquely captures dose-level drug combination responses. This capability unlocks previously inaccessible avenues of investigation compared to earlier models. Furthermore, the SAFER framework can be leveraged by future inquiries to investigate molecular networks that uniquely characterize individual patients and can be applied to prioritize personalized effective treatment based on safe dose combinations.


Assuntos
Redes Neurais de Computação , Humanos , Linhagem Celular Tumoral , Sinergismo Farmacológico , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/metabolismo , Relação Dose-Resposta a Droga , Transdução de Sinais/efeitos dos fármacos , Antineoplásicos/farmacologia
6.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36384083

RESUMO

BACKGROUND: Estimation of genetic relatedness, or kinship, is used occasionally for recreational purposes and in forensic applications. While numerous methods were developed to estimate kinship, they suffer from high computational requirements and often make an untenable assumption of homogeneous population ancestry of the samples. Moreover, genetic privacy is generally overlooked in the usage of kinship estimation methods. There can be ethical concerns about finding unknown familial relationships in third-party databases. Similar ethical concerns may arise while estimating and reporting sensitive population-level statistics such as inbreeding coefficients for the concerns around marginalization and stigmatization. RESULTS: Here, we present SIGFRIED, which makes use of existing reference panels with a projection-based approach that simplifies kinship estimation in the admixed populations. We use simulated and real datasets to demonstrate the accuracy and efficiency of kinship estimation. We present a secure federated kinship estimation framework and implement a secure kinship estimator using homomorphic encryption-based primitives for computing relatedness between samples in two different sites while genotype data are kept confidential. Source code and documentation for our methods can be found at https://doi.org/10.5281/zenodo.7053352. CONCLUSIONS: Analysis of relatedness is fundamentally important for identifying relatives, in association studies, and for estimation of population-level estimates of inbreeding. As the awareness of individual and group genomic privacy is growing, privacy-preserving methods for the estimation of relatedness are needed. Presented methods alleviate the ethical and privacy concerns in the analysis of relatedness in admixed, historically isolated and underrepresented populations. SHORT ABSTRACT: Genetic relatedness is a central quantity used for finding relatives in databases, correcting biases in genome wide association studies and for estimating population-level statistics. Methods for estimating genetic relatedness have high computational requirements, and occasionally do not consider individuals from admixed ancestries. Furthermore, the ethical concerns around using genetic data and calculating relatedness are not considered. We present a projection-based approach that can efficiently and accurately estimate kinship. We implement our method using encryption-based techniques that provide provable security guarantees to protect genetic data while kinship statistics are computed among multiple sites.


Assuntos
Estudo de Associação Genômica Ampla , Privacidade , Humanos , Genótipo , Privacidade Genética , Genoma
7.
Bioinformatics ; 39(39 Suppl 1): i168-i176, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387172

RESUMO

The rapid improvements in genomic sequencing technology have led to the proliferation of locally collected genomic datasets. Given the sensitivity of genomic data, it is crucial to conduct collaborative studies while preserving the privacy of the individuals. However, before starting any collaborative research effort, the quality of the data needs to be assessed. One of the essential steps of the quality control process is population stratification: identifying the presence of genetic difference in individuals due to subpopulations. One of the common methods used to group genomes of individuals based on ancestry is principal component analysis (PCA). In this article, we propose a privacy-preserving framework which utilizes PCA to assign individuals to populations across multiple collaborators as part of the population stratification step. In our proposed client-server-based scheme, we initially let the server train a global PCA model on a publicly available genomic dataset which contains individuals from multiple populations. The global PCA model is later used to reduce the dimensionality of the local data by each collaborator (client). After adding noise to achieve local differential privacy (LDP), the collaborators send metadata (in the form of their local PCA outputs) about their research datasets to the server, which then aligns the local PCA results to identify the genetic differences among collaborators' datasets. Our results on real genomic data show that the proposed framework can perform population stratification analysis with high accuracy while preserving the privacy of the research participants.


Assuntos
Genômica , Privacidade , Humanos , Mapeamento Cromossômico , Metadados , Análise de Componente Principal
8.
Bioinformatics ; 39(10)2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37856329

RESUMO

MOTIVATION: Genome-wide association studies (GWAS) benefit from the increasing availability of genomic data and cross-institution collaborations. However, sharing data across institutional boundaries jeopardizes medical data confidentiality and patient privacy. While modern cryptographic techniques provide formal secure guarantees, the substantial communication and computational overheads hinder the practical application of large-scale collaborative GWAS. RESULTS: This work introduces an efficient framework for conducting collaborative GWAS on distributed datasets, maintaining data privacy without compromising the accuracy of the results. We propose a novel two-step strategy aimed at reducing communication and computational overheads, and we employ iterative and sampling techniques to ensure accurate results. We instantiate our approach using logistic regression, a commonly used statistical method for identifying associations between genetic markers and the phenotype of interest. We evaluate our proposed methods using two real genomic datasets and demonstrate their robustness in the presence of between-study heterogeneity and skewed phenotype distributions using a variety of experimental settings. The empirical results show the efficiency and applicability of the proposed method and the promise for its application for large-scale collaborative GWAS. AVAILABILITY AND IMPLEMENTATION: The source code and data are available at https://github.com/amioamo/TDS.


Assuntos
Estudo de Associação Genômica Ampla , Privacidade , Humanos , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Confidencialidade , Software
9.
J Biomed Inform ; 149: 104545, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37992791

RESUMO

Liver transplantation is a life-saving procedure for patients with end-stage liver disease. There are two main challenges in liver transplant: finding the best matching patient for a donor and ensuring transplant equity among different subpopulations. The current MELD scoring system evaluates a patient's mortality risk if not receiving an organ within 90 days. However, the donor-patient matching should also consider post-transplant risk factors, such as cardiovascular disease, chronic rejection, etc., which are all common complications after transplant. Accurate prediction of these risk scores remains a significant challenge. In this study, we used predictive models to solve the above challenges. Specifically, we proposed a deep learning model to predict multiple risk factors after a liver transplant. By formulating it as a multi-task learning problem, the proposed deep neural network was trained to simultaneously predict the five post-transplant risks and achieve equal good performance by exploiting task-balancing techniques. We also proposed a novel fairness-achieving algorithm to ensure prediction fairness across different subpopulations. We used electronic health records of 160,360 liver transplant patients, including demographic information, clinical variables, and laboratory values, collected from the liver transplant records of the United States from 1987 to 2018. The model's performance was evaluated using various performance metrics such as AUROC and AUPRC. Our experiment results highlighted the success of our multi-task model in achieving task balance while maintaining accuracy. The model significantly reduced the task discrepancy by 39 %. Further application of the fairness-achieving algorithm substantially reduced fairness disparity among all sensitive attributes (gender, age group, and race/ethnicity) in each risk factor. It underlined the potency of integrating fairness considerations into the task-balancing framework, ensuring robust and fair predictions across multiple tasks and diverse demographic groups.


Assuntos
Aprendizado Profundo , Transplante de Fígado , Humanos , Estados Unidos , Doadores de Tecidos , Redes Neurais de Computação , Fatores de Risco
10.
J Biomed Inform ; 151: 104606, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38325698

RESUMO

Electronic health records (EHRs) store an extensive array of patient information, encompassing medical histories, diagnoses, treatments, and test outcomes. These records are crucial for enabling healthcare providers to make well-informed decisions regarding patient care. Summarizing clinical notes further assists healthcare professionals in pinpointing potential health risks and making better-informed decisions. This process contributes to reducing errors and enhancing patient outcomes by ensuring providers have access to the most pertinent and current patient data. Recent research has shown that incorporating instruction prompts with large language models (LLMs) substantially boosts the efficacy of summarization tasks. However, we show that this approach also leads to increased performance variance, resulting in significantly distinct summaries even when instruction prompts share similar meanings. To tackle this challenge, we introduce a model-agnostic Soft Prompt-BasedCalibration (SPeC) pipeline that employs soft prompts to lower variance while preserving the advantages of prompt-based summarization. Experimental findings on multiple clinical note tasks and LLMs indicate that our method not only bolsters performance but also effectively regulates variance across different LLMs, providing a more consistent and reliable approach to summarizing critical medical information.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Humanos , Calibragem , Idioma , Pessoal de Saúde
11.
J Clin Periodontol ; 51(5): 547-557, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38212876

RESUMO

AIM: To develop and validate an automated electronic health record (EHR)-based algorithm to suggest a periodontal diagnosis based on the 2017 World Workshop on the Classification of Periodontal Diseases and Conditions. MATERIALS AND METHODS: Using material published from the 2017 World Workshop, a tool was iteratively developed to suggest a periodontal diagnosis based on clinical data within the EHR. Pertinent clinical data included clinical attachment level (CAL), gingival margin to cemento-enamel junction distance, probing depth, furcation involvement (if present) and mobility. Chart reviews were conducted to confirm the algorithm's ability to accurately extract clinical data from the EHR, and then to test its ability to suggest an accurate diagnosis. Subsequently, refinements were made to address limitations of the data and specific clinical situations. Each refinement was evaluated through chart reviews by expert periodontists at the study sites. RESULTS: Three-hundred and twenty-three charts were manually reviewed, and a periodontal diagnosis (healthy, gingivitis or periodontitis including stage and grade) was made by expert periodontists for each case. After developing the initial version of the algorithm using the unmodified 2017 World Workshop criteria, accuracy was 71.8% for stage alone and 64.7% for stage and grade. Subsequently, 16 modifications to the algorithm were proposed and 14 were accepted. This refined version of the algorithm had 79.6% accuracy for stage alone and 68.8% for stage and grade together. CONCLUSIONS: Our findings suggest that a rule-based algorithm for suggesting a periodontal diagnosis using EHR data can be implemented with moderate accuracy in support of chairside clinical diagnostic decision making, especially for inexperienced clinicians. Grey-zone cases still exist, where clinical judgement will be required. Future applications of similar algorithms with improved performance will depend upon the quality (completeness/accuracy) of EHR data.


Assuntos
Gengivite , Doenças Periodontais , Periodontite , Humanos , Registros Eletrônicos de Saúde , Doenças Periodontais/diagnóstico , Algoritmos
12.
BMC Med Inform Decis Mak ; 24(1): 147, 2024 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-38816848

RESUMO

BACKGROUND: Securing adequate data privacy is critical for the productive utilization of data. De-identification, involving masking or replacing specific values in a dataset, could damage the dataset's utility. However, finding a reasonable balance between data privacy and utility is not straightforward. Nonetheless, few studies investigated how data de-identification efforts affect data analysis results. This study aimed to demonstrate the effect of different de-identification methods on a dataset's utility with a clinical analytic use case and assess the feasibility of finding a workable tradeoff between data privacy and utility. METHODS: Predictive modeling of emergency department length of stay was used as a data analysis use case. A logistic regression model was developed with 1155 patient cases extracted from a clinical data warehouse of an academic medical center located in Seoul, South Korea. Nineteen de-identified datasets were generated based on various de-identification configurations using ARX, an open-source software for anonymizing sensitive personal data. The variable distributions and prediction results were compared between the de-identified datasets and the original dataset. We examined the association between data privacy and utility to determine whether it is feasible to identify a viable tradeoff between the two. RESULTS: All 19 de-identification scenarios significantly decreased re-identification risk. Nevertheless, the de-identification processes resulted in record suppression and complete masking of variables used as predictors, thereby compromising dataset utility. A significant correlation was observed only between the re-identification reduction rates and the ARX utility scores. CONCLUSIONS: As the importance of health data analysis increases, so does the need for effective privacy protection methods. While existing guidelines provide a basis for de-identifying datasets, achieving a balance between high privacy and utility is a complex task that requires understanding the data's intended use and involving input from data users. This approach could help find a suitable compromise between data privacy and utility.


Assuntos
Confidencialidade , Anonimização de Dados , Humanos , Confidencialidade/normas , Serviço Hospitalar de Emergência , Tempo de Internação , República da Coreia , Masculino
13.
Molecules ; 29(12)2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38930832

RESUMO

In this research, with an aim to develop novel pyrazole oxime ether derivatives possessing potential biological activity, thirty-two pyrazole oxime ethers, including a substituted pyridine ring, have been synthesized and structurally identified through 1H NMR, 13C NMR, and HRMS. Bioassay data indicated that most of these compounds owned strong insecticidal properties against Mythimna separata, Tetranychus cinnabarinus, Plutella xylostella, and Aphis medicaginis at a dosage of 500 µg/mL, and some title compounds were active towards Nilaparvata lugens at 500 µg/mL. Furthermore, some of the designed compounds had potent insecticidal effects against M. separata, T. cinnabarinus, or A. medicaginis at 100 µg/mL, with the mortalities of compounds 8a, 8c, 8d, 8e, 8f, 8g, 8o, 8s, 8v, 8x, and 8z against A. medicaginis, in particular, all reaching 100%. Even when the dosage was lowered to 20 µg/mL, compound 8s also expressed 50% insecticidal activity against M. separata, and compounds 8a, 8e, 8f, 8o, 8v, and 8x displayed more than 60% inhibition rates against A. medicaginis. The current results provided a significant basis for the rational design of biologically active pyrazole oxime ethers in future.


Assuntos
Desenho de Fármacos , Inseticidas , Oximas , Pirazóis , Pirazóis/química , Pirazóis/farmacologia , Pirazóis/síntese química , Oximas/química , Oximas/farmacologia , Oximas/síntese química , Inseticidas/química , Inseticidas/síntese química , Inseticidas/farmacologia , Animais , Relação Estrutura-Atividade , Éteres/química , Estrutura Molecular , Piridinas/química , Piridinas/farmacologia , Piridinas/síntese química , Mariposas/efeitos dos fármacos
14.
Hum Brain Mapp ; 44(1): 131-141, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36066186

RESUMO

Parahippocampal cortex (PHC) is a vital neural bases in spatial navigation. However, its functional role is still unclear. "Contextual hypothesis," which assumes that the PHC participates in processing the spatial association between the landmark and destination, provides a potential answer to the question. Nevertheless, the hypothesis was previously tested using the picture categorization task, which is indirectly related to spatial navigation. By now, study is still needed for testing the hypothesis with a navigation-related paradigm. In the current study, we tested the hypothesis by an fMRI experiment in which participants performed a distance estimation task in a virtual environment under three different conditions: landmark free (LF), stable landmark (SL), and ambiguous landmark (AL). By analyzing the behavioral data, we found that the presence of an SL improved the participants' performance in distance estimation. Comparing the brain activity in SL-versus-LF contrast as well as AL-versus-LF contrast, we found that the PHC was activated by the SL rather than by AL when encoding the distance. This indicates that the PHC is elicited by strongly associated context and encodes the landmark reference for distance perception. Furthermore, accessing the representational similarity with the activity of the PHC across conditions, we observed a high similarity within the same condition but low similarity between conditions. This result indicated that the PHC sustains the contextual information for discriminating between scenes. Our findings provided insights into the neural correlates of the landmark information processing from the perspective of contextual hypothesis.


Assuntos
Giro Para-Hipocampal , Navegação Espacial , Humanos , Giro Para-Hipocampal/diagnóstico por imagem , Córtex Cerebral , Cognição , Imageamento por Ressonância Magnética , Mapeamento Encefálico
15.
Bioinformatics ; 38(10): 2826-2831, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561199

RESUMO

MOTIVATION: Evaluating the blood-brain barrier (BBB) permeability of drug molecules is a critical step in brain drug development. Traditional methods for the evaluation require complicated in vitro or in vivo testing. Alternatively, in silico predictions based on machine learning have proved to be a cost-efficient way to complement the in vitro and in vivo methods. However, the performance of the established models has been limited by their incapability of dealing with the interactions between drugs and proteins, which play an important role in the mechanism behind the BBB penetrating behaviors. To address this limitation, we employed the relational graph convolutional network (RGCN) to handle the drug-protein interactions as well as the properties of each individual drug. RESULTS: The RGCN model achieved an overall accuracy of 0.872, an area under the receiver operating characteristic (AUROC) of 0.919 and an area under the precision-recall curve (AUPRC) of 0.838 for the testing dataset with the drug-protein interactions and the Mordred descriptors as the input. Introducing drug-drug similarity to connect structurally similar drugs in the data graph further improved the testing results, giving an overall accuracy of 0.876, an AUROC of 0.926 and an AUPRC of 0.865. In particular, the RGCN model was found to greatly outperform the LightGBM base model when evaluated with the drugs whose BBB penetration was dependent on drug-protein interactions. Our model is expected to provide high-confidence predictions of BBB permeability for drug prioritization in the experimental screening of BBB-penetrating drugs. AVAILABILITY AND IMPLEMENTATION: The data and the codes are freely available at https://github.com/dingyan20/BBB-Penetration-Prediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Barreira Hematoencefálica , Aprendizado de Máquina , Transporte Biológico , Barreira Hematoencefálica/metabolismo , Encéfalo/metabolismo , Permeabilidade , Proteínas/metabolismo
16.
J Biomed Inform ; 137: 104256, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36455806

RESUMO

Big data and (deep) machine learning have been ambitious tools in digital medicine, but these tools focus mainly on association. Intervention in medicine is about the causal effects. The average treatment effect has long been studied as a measure of causal effect, assuming that all populations have the same effect size. However, no "one-size-fits-all" treatment seems to work in some complex diseases. Treatment effects may vary by patient. Estimating heterogeneous treatment effects (HTE) may have a high impact on developing personalized treatment. Lots of advanced machine learning models for estimating HTE have emerged in recent years, but there has been limited translational research into the real-world healthcare domain. To fill the gap, we reviewed and compared eleven recent HTE estimation methodologies, including meta-learner, representation learning models, and tree-based models. We performed a comprehensive benchmark experiment based on nationwide healthcare claim data with application to Alzheimer's disease drug repurposing. We provided some challenges and opportunities in HTE estimation analysis in the healthcare domain to close the gap between innovative HTE models and deployment to real-world healthcare problems.


Assuntos
Benchmarking , Aprendizado de Máquina , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto , Causalidade
17.
J Biomed Inform ; 139: 104269, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36621750

RESUMO

Electronic health records (EHR) are collected as a routine part of healthcare delivery, and have great potential to be utilized to improve patient health outcomes. They contain multiple years of health information to be leveraged for risk prediction, disease detection, and treatment evaluation. However, they do not have a consistent, standardized format across institutions, particularly in the United States, and can present significant analytical challenges- they contain multi-scale data from heterogeneous domains and include both structured and unstructured data. Data for individual patients are collected at irregular time intervals and with varying frequencies. In addition to the analytical challenges, EHR can reflect inequity- patients belonging to different groups will have differing amounts of data in their health records. Many of these issues can contribute to biased data collection. The consequence is that the data for under-served groups may be less informative partly due to more fragmented care, which can be viewed as a type of missing data problem. For EHR data in this complex form, there is currently no framework for introducing realistic missing values. There has also been little to no work in assessing the impact of missing data in EHR. In this work, we first introduce a terminology to define three levels of EHR data and then propose a novel framework for simulating realistic missing data scenarios in EHR to adequately assess their impact on predictive modeling. We incorporate the use of a medical knowledge graph to capture dependencies between medical events to create a more realistic missing data framework. In an intensive care unit setting, we found that missing data have greater negative impact on the performance of disease prediction models in groups that tend to have less access to healthcare, or seek less healthcare. We also found that the impact of missing data on disease prediction models is stronger when using the knowledge graph framework to introduce realistic missing values as opposed to random event removal.


Assuntos
Atenção à Saúde , Registros Eletrônicos de Saúde , Humanos , Estados Unidos , Unidades de Terapia Intensiva
18.
J Biomed Inform ; 143: 104399, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37211197

RESUMO

The emphasis on fairness in predictive healthcare modeling has increased in popularity as an approach for overcoming biases in automated decision-making systems. The aim is to guarantee that sensitive characteristics like gender, race, and ethnicity do not influence prediction outputs. Numerous algorithmic strategies have been proposed to reduce bias in prediction results, mitigate prejudice toward minority groups and promote prediction fairness. The goal of these strategies is to ensure that model prediction performance does not exhibit significant disparity among sensitive groups. In this study, we propose a novel fairness-achieving scheme based on multitask learning, which fundamentally differs from conventional fairness-achieving techniques, including altering data distributions and constraint optimization through regularizing fairness metrics or tampering with prediction outcomes. By dividing predictions on different sub-populations into separate tasks, we view the fairness problem as a task-balancing problem. To ensure fairness during the model-training process, we suggest a novel dynamic re-weighting approach. Fairness is achieved by dynamically modifying the gradients of various prediction tasks during neural network back-propagation, and this novel technique applies to a wide range of fairness criteria. We conduct tests on a real-world use case to predict sepsis patients' mortality risk. Our approach satisfies that it can reduce the disparity between subgroups by 98% while only losing less than 4% of prediction accuracy.


Assuntos
Aprendizagem , Sepse , Humanos , Benchmarking , Grupos Minoritários , Redes Neurais de Computação
19.
J Biomed Inform ; 139: 104322, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36806328

RESUMO

Linking data across studies offers an opportunity to enrich data sets and provide a stronger basis for data-driven models for biomedical discovery and/or prognostication. Several techniques to link records have been proposed, and some have been implemented across data repositories holding molecular and clinical data. Not all these techniques guarantee appropriate privacy protection; there are trade-offs between (a) simple strategies that can be associated with data that will be linked and shared with any party and (b) more complex strategies that preserve the privacy of individuals across parties. We propose an intermediary, practical strategy to support linkage in studies that share de-identified data with Data Coordinating Centers. This technology can be extended to link data across multiple data hubs to support privacy preserving record linkage, considering data coordination centers and their awardees, which can be extended to a hierarchy of entities (e.g., awardees, data coordination centers, data hubs, etc.) b.


Assuntos
Pesquisa Biomédica , Privacidade , Humanos , Segurança Computacional
20.
Environ Res ; 231(Pt 3): 116258, 2023 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-37268201

RESUMO

Metal oxide modified biochars are increasingly being used for intensive agricultural soil remediation, but there has been limited research on their effects on soil phosphorus transformation, soil enzyme activity, microbe community and plant growth. Two highly-performance metal oxides biochars (FeAl-biochar and MgAl-biochar) were investigated for their effects on soil phosphorus availability, fractions, enzyme activity, microbe community and plant growth in two typical intensive fertile agricultural soils. Adding raw biochar to acidic soil increased NH4Cl-P content, while metal oxide biochar reduced NH4Cl-P content by binding to phosphorus. Original biochar slightly reduced Al-P content in lateritic red soil, while metal oxide biochar increased it. LBC and FBC significantly reduced Ca2-P and Ca8-P properties while improving Al-P and Fe-P, respectively. Inorganic phosphorus-solubilizing bacteria increased in abundance with biochar amendment in both soil types, and biochar addition affected soil pH and phosphorus fractions, leading to changes in bacterial growth and community structure. Biochar's microporous structure allowed it to adsorb phosphorus and aluminum ions, making them more available for plants and reducing leaching. In calcareous soils, biochar additions may dominantly increase the Ca (hydro)oxides bounded P or soluble P instead of Fe-P or Al-P through biotic pathways, favoring plant growth. The recommendations for using metal oxides biochar for fertile soil management include using LBC biochar for optimal performance in both P leaching reduction and plant growth promotion, with the mechanisms differing depending on soil type. This research highlights the potential of metal oxide modified biochars for improving soil fertility and reducing phosphorus leaching, with specific recommendations for their use in different soil types.


Assuntos
Poluentes do Solo , Solo , Solo/química , Fósforo , Carvão Vegetal/química , Óxidos , Poluentes do Solo/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA