Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 511
Filter
1.
Proc Mach Learn Res ; 238: 4195-4203, 2024 May.
Article in English | MEDLINE | ID: mdl-39267895

ABSTRACT

Missing values are prevalent in temporal electronic health records (EHR) data and are known to complicate data analysis and lead to biased results. The current state-of-the-art (SOTA) models for imputing missing values in EHR primarily leverage correlations across time points and across features, which perform well when data have strong correlation across time points, such as in ICU data where high-frequency time series data are collected. However, this is often insufficient for temporal EHR data from non-ICU settings (e.g., outpatient visits for primary care or specialty care), where data are collected at substantially sparser time points, resulting in much weaker correlation across time points. To address this methodological gap, we propose the Similarity-Aware Diffusion Model-Based Imputation (SADI), a novel imputation method that leverages the diffusion model and utilizes information across dependent variables. We apply SADI to impute incomplete temporal EHR data and propose a similarity-aware denoising function, which includes a self-attention mechanism to model the correlations between time points, features, and similar patients. To the best of our knowledge, this is the first time that the information of similar patients is directly used to construct imputation for incomplete temporal EHR data. Our extensive experiments on two datasets, the Critical Path For Alzheimer's Disease (CPAD) data and the PhysioNet Challenge 2012 data, show that SADI outperforms the current SOTA under various missing data mechanisms, including missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).

2.
Brief Bioinform ; 25(6)2024 Sep 23.
Article in English | MEDLINE | ID: mdl-39344710

ABSTRACT

Epidemiologic and genetic studies in many complex diseases suggest subgroup disparities (e.g. by sex, race) in disease course and patient outcomes. We consider this from the standpoint of integrative analysis where we combine information from different views (e.g. genomics, proteomics, clinical data). Existing integrative analysis methods ignore the heterogeneity in subgroups, and stacking the views and accounting for subgroup heterogeneity does not model the association among the views. We propose Heterogeneity in Integration and Prediction (HIP), a statistical approach for joint association and prediction that leverages the strengths in each view to identify molecular signatures that are shared by and specific to a subgroup. We apply HIP to proteomics and gene expression data pertaining to chronic obstructive pulmonary disease (COPD) to identify proteins and genes shared by, and unique to, males and females, contributing to the variation in COPD, measured by airway wall thickness. Our COPD findings have identified proteins, genes, and pathways that are common across and specific to males and females, some implicated in COPD, while others could lead to new insights into sex differences in COPD mechanisms. HIP accounts for subgroup heterogeneity in multi-view data, ranks variables based on importance, is applicable to univariate or multivariate continuous outcomes, and incorporates covariate adjustment. With the efficient algorithms implemented using PyTorch, this method has many potential scientific applications and could enhance multiomics research in health disparities. HIP is available at https://github.com/lasandrall/HIP, a video tutorial at https://youtu.be/O6E2OLmeMDo and a Shiny Application at https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/ for users with limited programming experience.


Subject(s)
Pulmonary Disease, Chronic Obstructive , Humans , Pulmonary Disease, Chronic Obstructive/genetics , Male , Female , Proteomics/methods , Algorithms , Genomics/methods , Computational Biology/methods
3.
EBioMedicine ; 108: 105333, 2024 Sep 24.
Article in English | MEDLINE | ID: mdl-39321500

ABSTRACT

BACKGROUND: While many patients seem to recover from SARS-CoV-2 infections, many patients report experiencing SARS-CoV-2 symptoms for weeks or months after their acute COVID-19 ends, even developing new symptoms weeks after infection. These long-term effects are called post-acute sequelae of SARS-CoV-2 (PASC) or, more commonly, Long COVID. The overall prevalence of Long COVID is currently unknown, and tools are needed to help identify patients at risk for developing long COVID. METHODS: A working group of the Rapid Acceleration of Diagnostics-radical (RADx-rad) program, comprised of individuals from various NIH institutes and centers, in collaboration with REsearching COVID to Enhance Recovery (RECOVER) developed and organized the Long COVID Computational Challenge (L3C), a community challenge aimed at incentivizing the broader scientific community to develop interpretable and accurate methods for identifying patients at risk of developing Long COVID. From August 2022 to December 2022, participants developed Long COVID risk prediction algorithms using the National COVID Cohort Collaborative (N3C) data enclave, a harmonized data repository from over 75 healthcare institutions from across the United States (U.S.). FINDINGS: Over the course of the challenge, 74 teams designed and built 35 Long COVID prediction models using the N3C data enclave. The top 10 teams all scored above a 0.80 Area Under the Receiver Operator Curve (AUROC) with the highest scoring model achieving a mean AUROC of 0.895. Included in the top submission was a visualization dashboard that built timelines for each patient, updating the risk of a patient developing Long COVID in response to clinical events. INTERPRETATION: As a result of L3C, federal reviewers identified multiple machine learning models that can be used to identify patients at risk for developing Long COVID. Many of the teams used approaches in their submissions which can be applied to future clinical prediction questions. FUNDING: Research reported in this RADx® Rad publication was supported by the National Institutes of Health. Timothy Bergquist, Johanna Loomba, and Emily Pfaff were supported by Axle Subcontract: NCATS-STSS-P00438.

4.
Article in English | MEDLINE | ID: mdl-39348263

ABSTRACT

Tensor Canonical Correlation Analysis (TCCA) is a commonly employed statistical method utilized to examine linear associations between two sets of tensor datasets. However, the existing TCCA models fail to adequately address the heterogeneity present in real-world tensor data, such as brain imaging data collected from diverse groups characterized by factors like sex and race. Consequently, these models may yield biased outcomes. In order to surmount this constraint, we propose a novel approach called Multi-Group TCCA (MG-TCCA), which enables the joint analysis of multiple subgroups. By incorporating a dual sparsity structure and a block coordinate ascent algorithm, our MG-TCCA method effectively addresses heterogeneity and leverages information across different groups to identify consistent signals. This novel approach facilitates the quantification of shared and individual structures, reduces data dimensionality, and enables visual exploration. To empirically validate our approach, we conduct a study focused on investigating correlations between two brain positron emission tomography (PET) modalities (AV-45 and FDG) within an Alzheimer's disease (AD) cohort. Our results demonstrate that MG-TCCA surpasses traditional TCCA and Sparse TCCA (STCCA) in identifying sex-specific cross-modality imaging correlations. This heightened performance of MG-TCCA provides valuable insights for the characterization of multimodal imaging biomarkers in AD.

5.
Heliyon ; 10(14): e34444, 2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39113973

ABSTRACT

Mycobacterium marinum(M. marinum ), a slow-growing bacterium in freshwater and seawater, can cause cutanous and extracutaneous infections. A fisher-woman with systemic lupus erythematosus (SLE) presented with chronic polymorphic rashes in a lymphangitic pattern was initially misdiagnosed as sporotrichosis. The final diagnosis of M. marinum and Candida dubliniensis co-infection was confirmed based on the skin histopathology, pustule culture, MetaCAP sequencing and effective antibiotic combination treatments.

6.
Proc Mach Learn Res ; 235: 53597-53618, 2024 Jul.
Article in English | MEDLINE | ID: mdl-39205826

ABSTRACT

Designing faithful yet accurate AI models is challenging, particularly in the field of individual treatment effect estimation (ITE). ITE prediction models deployed in critical settings such as healthcare should ideally be (i) accurate, and (ii) provide faithful explanations. However, current solutions are inadequate: state-of-the-art black-box models do not supply explanations, post-hoc explainers for black-box models lack faithfulness guarantees, and self-interpretable models greatly compromise accuracy. To address these issues, we propose DISCRET, a self-interpretable ITE framework that synthesizes faithful, rule-based explanations for each sample. A key insight behind DISCRET is that explanations can serve dually as database queries to identify similar subgroups of samples. We provide a novel RL algorithm to efficiently synthesize these explanations from a large search space. We evaluate DISCRET on diverse tasks involving tabular, image, and text data. DISCRET outperforms the best self-interpretable models and has accuracy comparable to the best black-box models while providing faithful explanations. DISCRET is available at https://github.com/wuyinjun-1993/DISCRET-ICML2024.

7.
Proc Mach Learn Res ; 247: 5074-5075, 2024.
Article in English | MEDLINE | ID: mdl-39206101

ABSTRACT

Training Deep Neural Networks (DNNs) with adversarial examples often results in poor generalization to test-time adversarial data. This paper investigates this issue, known as adversarially robust generalization, through the lens of Rademacher complexity. Building upon the studies by Khim and Loh (2018); Yin et al. (2019), numerous works have been dedicated to this problem, yet achieving a satisfactory bound remains an elusive goal. Existing works on DNNs either apply to a surrogate loss instead of the robust loss or yield bounds that are notably looser compared to their standard counterparts. In the latter case, the bounds have a higher dependency on the width m of the DNNs or the dimension d of the data, with an extra factor of at least 𝒪 ( m ) or 𝒪 ( d ) . This paper presents upper bounds for adversarial Rademacher complexity of DNNs that match the best-known upper bounds in standard settings, as established in the work of Bartlett et al. (2017), with the dependency on width and dimension being 𝒪 ( ln ( d m ) ) . The central challenge addressed is calculating the covering number of adversarial function classes. We aim to construct a new cover that possesses two properties: 1) compatibility with adversarial examples, and 2) precision comparable to covers used in standard settings. To this end, we introduce a new variant of covering number called the uniform covering number, specifically designed and proven to reconcile these two properties. Consequently, our method effectively bridges the gap between Rademacher complexity in robust and standard generalization.

9.
Front Public Health ; 12: 1380884, 2024.
Article in English | MEDLINE | ID: mdl-39050599

ABSTRACT

Background: Achieving a higher level of accessibility and equity to community healthcare services has become a major concern for health service delivery from the perspectives of health planners and policy makers in China. Methods: In this study, we introduced a comprehensive door-to-door (D2D) model, integrating it with the open OD API results for precise computation of accessibility to community hospitals over different transport modes. For the D2D public transit mode, we computed the temporal variation and standard deviation of accessibility at different times of the day. Additionally, accessibility values for D2D riding mode, D2D driving mode, and simple driving mode were also computed for comparison. Moreover, we introduced Lorenz curve and Gini index to assess the differences in equity of community healthcare across different times and transport modes. Results: The D2D public transit mode exhibits noticeable fluctuations in accessibility and equity based on the time of day. Accessibility and equity were notably influenced by traffic flow between 8 AM and 11 AM, while during the period from 12 PM to 10 PM, the open hours of community hospitals became a more significant determinant in Nanjing. The moments with the most equitable and inequitable overall spatial layouts were 10 AM and 10 PM, respectively. Among the four transport modes, the traditional simple driving mode exhibited the smallest equity index, with a Gini value of only 0.243. In contrast, the D2D riding mode, while widely preferred for accessing community healthcare services, had the highest Gini value, reaching 0.472. Conclusion: The proposed method combined the D2D model with the open OD API results is effective for accessibility computation of real transport modes. Spatial accessibility and equity of community healthcare experience significant fluctuations influenced by time variations. The transportation mode is also a significant factor affecting accessibility and equity level. These results are helpful to both planners and scholars that aim to build comprehensive spatial accessibility and equity models and optimize the location of public service facilities from the perspective of different temporal scales and a multi-mode transport system.


Subject(s)
Health Services Accessibility , Transportation , Humans , Health Services Accessibility/statistics & numerical data , China , Transportation/statistics & numerical data , Time Factors , Community Health Services/statistics & numerical data , Hospitals, Community/statistics & numerical data
10.
Nat Commun ; 15(1): 5763, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38982051

ABSTRACT

While high circulating tumor DNA (ctDNA) levels are associated with poor survival for multiple cancers, variant-specific differences in the association of ctDNA levels and survival have not been examined. Here we investigate KRAS ctDNA (ctKRAS) variant-specific associations with overall and progression-free survival (OS/PFS) in first-line metastatic pancreatic ductal adenocarcinoma (mPDAC) for patients receiving chemoimmunotherapy ("PRINCE", NCT03214250), and an independent cohort receiving standard of care (SOC) chemotherapy. For PRINCE, higher baseline plasma levels are associated with worse OS for ctKRAS G12D (log-rank p = 0.0010) but not G12V (p = 0.7101), even with adjustment for clinical covariates. Early, on-therapy clearance of G12D (p = 0.0002), but not G12V (p = 0.4058), strongly associates with OS for PRINCE. Similar results are obtained for the SOC cohort, and for PFS in both cohorts. These results suggest ctKRAS G12D but not G12V as a promising prognostic biomarker for mPDAC and that G12D clearance could also serve as an early biomarker of response.


Subject(s)
Biomarkers, Tumor , Carcinoma, Pancreatic Ductal , Circulating Tumor DNA , Pancreatic Neoplasms , Proto-Oncogene Proteins p21(ras) , Humans , Carcinoma, Pancreatic Ductal/genetics , Carcinoma, Pancreatic Ductal/mortality , Carcinoma, Pancreatic Ductal/blood , Carcinoma, Pancreatic Ductal/pathology , Carcinoma, Pancreatic Ductal/drug therapy , Proto-Oncogene Proteins p21(ras)/genetics , Pancreatic Neoplasms/genetics , Pancreatic Neoplasms/blood , Pancreatic Neoplasms/mortality , Pancreatic Neoplasms/pathology , Pancreatic Neoplasms/drug therapy , Female , Male , Circulating Tumor DNA/blood , Circulating Tumor DNA/genetics , Middle Aged , Aged , Biomarkers, Tumor/blood , Biomarkers, Tumor/genetics , Prognosis , Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Mutation , Progression-Free Survival , Neoplasm Metastasis
11.
Photodiagnosis Photodyn Ther ; 49: 104270, 2024 Jul 11.
Article in English | MEDLINE | ID: mdl-39002834

ABSTRACT

PURPOSE: This cross-sectional study measured retinal vessel density (VD) in patients with digestive tract malignancy by optical coherence tomography angiography (OCTA), and compared them with healthy controls to explore the retinal microcirculation changes in patients with digestive tract malignancy. METHODS: 106 eligible participants were divided into three groups: gastric cancer (GC) group (36 individuals), colorectal cancer (CRC) group (34 individuals), and healthy control group (36 individuals). Angio 6 × 6 512 × 512 R4 and ONH Angio 6 × 6 512 × 512 R4 modes were performed to collect retinal vessel density data centered on fovea and papillary, respectively. The retina was automatically segmented into different layers (superficial vascular plexus (SVP), the inner retinal layer, radial peripapillary capillary plexus (RPCP), deep vascular plexus (DVP)) and areas to analyze. RESULTS: At the optic nerve head (ONH) region, the VD of the inner retinal layer increased in both GC and CRC groups in all quadrants and areas. In the papillary area, VD in the inner retinal layer, SVP, and RPCP increased in the GC and CRC groups. In the parapapillary area, VD in the inner retinal layer increased in the GC and the CRC groups. Significant increase in the global VD were found in the GC group of the RPCP and SVP. Regarding the macular region, no statistical differences were observed in each layer. CONCLUSIONS: The study suggested that retinal vessel density changed in patients with digestive tract malignancy, especially in the inner retinal layer of the ONH region, revealing the potential relevance of the relation between gastrointestinal cancer and retinal microcirculation.

12.
AMIA Jt Summits Transl Sci Proc ; 2024: 211-220, 2024.
Article in English | MEDLINE | ID: mdl-38827072

ABSTRACT

Fairness is crucial in machine learning to prevent bias based on sensitive attributes in classifier predictions. However, the pursuit of strict fairness often sacrifices accuracy, particularly when significant prevalence disparities exist among groups, making classifiers less practical. For example, Alzheimer's disease (AD) is more prevalent in women than men, making equal treatment inequitable for females. Accounting for prevalence ratios among groups is essential for fair decision-making. In this paper, we introduce prior knowledge for fairness, which incorporates prevalence ratio information into the fairness constraint within the Empirical Risk Minimization (ERM) framework. We develop the Prior-knowledge-guided Fair ERM (PFERM) framework, aiming to minimize expected risk within a specified function class while adhering to a prior-knowledge-guided fairness constraint. This approach strikes a flexible balance between accuracy and fairness. Empirical results confirm its effectiveness in preserving fairness without compromising accuracy.

13.
Annu Rev Biomed Data Sci ; 7(1): 391-418, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38848574

ABSTRACT

Alzheimer's disease (AD) is a critical national concern, affecting 5.8 million people and costing more than $250 billion annually. However, there is no available cure. Thus, effective strategies are in urgent need to discover AD biomarkers for disease early detection and drug development. In this review, we study AD from a biomedical data scientist perspective to discuss the four fundamental components in AD research: genetics (G), molecular multiomics (M), multimodal imaging biomarkers (B), and clinical outcomes (O) (collectively referred to as the GMBO framework). We provide a comprehensive review of common statistical and informatics methodologies for each component within the GMBO framework, accompanied by the major findings from landmark AD studies. Our review highlights the potential of multimodal biobank data in addressing key challenges in AD, such as early diagnosis, disease heterogeneity, and therapeutic development. We identify major hurdles in AD research, including data scarcity and complexity, and advocate for enhanced collaboration, data harmonization, and advanced modeling techniques. This review aims to be an essential guide for understanding current biomedical data science strategies in AD research, emphasizing the need for integrated, multidisciplinary approaches to advance our understanding and management of AD.


Subject(s)
Alzheimer Disease , Biomarkers , Alzheimer Disease/genetics , Alzheimer Disease/diagnosis , Alzheimer Disease/metabolism , Humans , Biomarkers/metabolism , Genomics/methods , Biomedical Research/methods , Multiomics
14.
Dis Esophagus ; 2024 Jun 17.
Article in English | MEDLINE | ID: mdl-38881278

ABSTRACT

The study aimed to describe the prevalence of lymph node metastases per lymph node station for esophageal squamous cell carcinoma (ESCC) after neoadjuvant treatment. Clinicopathological variables of ESCC patients were retrieved from the prospective database of the Surgical Esophageal Cancer Patient Registry in West China Hospital, Sichuan University. A two-field lymphadenectomy was routinely performed, and an extensive three-field lymphadenectomy was performed if cervical lymph node metastasis was suspected. According to AJCC/UICC 8, lymph node stations were investigated separately. The number of patients with metastatic lymph nodes divided by those who underwent lymph node dissection at that station was used to define the percentage of patients with lymph node metastases. Data are also separately analyzed according to the pathological response of the primary tumor, neoadjuvant treatment regimens, pretreatment tumor length, and tumor location. Between January 2019 and March 2023, 623 patients who underwent neoadjuvant therapy followed by transthoracic esophagectomy were enrolled. Lymph node metastases were found in 212 patients (34.0%) and most frequently seen in lymph nodes along the right recurrent nerve (10.1%, 58/575), paracardial station (11.4%, 67/587), and lymph nodes along the left gastric artery (10.9%, 65/597). For patients with pretreatment tumor length of >4 cm and non-pathological complete response of the primary tumor, the metastatic rate of the right lower cervical paratracheal lymph nodes is 10.9% (10/92) and 10.6% (11/104), respectively. For patients with an upper thoracic tumor, metastatic lymph nodes were most frequently seen along the right recurrent nerve (14.2%, 8/56). For patients with a middle thoracic tumor, metastatic lymph nodes were most commonly seen in the right lower cervical paratracheal lymph nodes (10.3%, 8/78), paracardial lymph nodes (10.2%, 29/285), and lymph nodes along the left gastric artery (10.4%, 30/289). For patients with a lower thoracic tumor, metastatic lymph nodes were most frequently seen in the paracardial station (14.2%, 35/247) and lymph nodes along the left gastric artery (13.1%, 33/252). The study precisely determined the distribution of lymph node metastases in ESCC after neoadjuvant treatment, which may help to optimize the extent of lymphadenectomy in the surgical management of ESCC patients after neoadjuvant therapy.

15.
Zhongguo Dang Dai Er Ke Za Zhi ; 26(5): 450-455, 2024 May 15.
Article in Chinese | MEDLINE | ID: mdl-38802903

ABSTRACT

OBJECTIVES: To investigate the incidence rate, clinical characteristics, and prognosis of neonatal stroke in Shenzhen, China. METHODS: Led by Shenzhen Children's Hospital, the Shenzhen Neonatal Data Collaboration Network organized 21 institutions to collect 36 cases of neonatal stroke from January 2020 to December 2022. The incidence, clinical characteristics, treatment, and prognosis of neonatal stroke in Shenzhen were analyzed. RESULTS: The incidence rate of neonatal stroke in 21 hospitals from 2020 to 2022 was 1/15 137, 1/6 060, and 1/7 704, respectively. Ischemic stroke accounted for 75% (27/36); boys accounted for 64% (23/36). Among the 36 neonates, 31 (86%) had disease onset within 3 days after birth, and 19 (53%) had convulsion as the initial presentation. Cerebral MRI showed that 22 neonates (61%) had left cerebral infarction and 13 (36%) had basal ganglia infarction. Magnetic resonance angiography was performed for 12 neonates, among whom 9 (75%) had involvement of the middle cerebral artery. Electroencephalography was performed for 29 neonates, with sharp waves in 21 neonates (72%) and seizures in 10 neonates (34%). Symptomatic/supportive treatment varied across different hospitals. Neonatal Behavioral Neurological Assessment was performed for 12 neonates (33%, 12/36), with a mean score of (32±4) points. The prognosis of 27 neonates was followed up to around 12 months of age, with 44% (12/27) of the neonates having a good prognosis. CONCLUSIONS: Ischemic stroke is the main type of neonatal stroke, often with convulsions as the initial presentation, involvement of the middle cerebral artery, sharp waves on electroencephalography, and a relatively low neurodevelopment score. Symptomatic/supportive treatment is the main treatment method, and some neonates tend to have a poor prognosis.


Subject(s)
Stroke , Humans , Male , Infant, Newborn , Female , China/epidemiology , Stroke/epidemiology , Prognosis , Electroencephalography , Incidence , Magnetic Resonance Imaging
16.
Blood Adv ; 8(13): 3507-3518, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38739715

ABSTRACT

ABSTRACT: Little is known about risk factors for central nervous system (CNS) relapse in mature T-cell and natural killer cell neoplasms (MTNKNs). We aimed to describe the clinical epidemiology of CNS relapse in patients with MTNKN and developed the CNS relapse In T-cell lymphoma Index (CITI) to predict patients at the highest risk of CNS relapse. We reviewed data from 135 patients with MTNKN and CNS relapse from 19 North American institutions. After exclusion of leukemic and most cutaneous forms of MTNKNs, patients were pooled with non-CNS relapse control patients from a single institution to create a CNS relapse-enriched training set. Using a complete case analysis (n = 182), including 91 with CNS relapse, we applied a least absolute shrinkage and selection operator Cox regression model to select weighted clinicopathologic variables for the CITI score, which we validated in an external cohort from the Swedish Lymphoma Registry (n = 566). CNS relapse was most frequently observed in patients with peripheral T-cell lymphoma, not otherwise specified (25%). Median time to CNS relapse and median overall survival after CNS relapse were 8.0 and 4.7 months, respectively. We calculated unique CITI risk scores for individual training set patients and stratified them into risk terciles. Validation set patients with low-risk (n = 158) and high-risk (n = 188) CITI scores had a 10-year cumulative risk of CNS relapse of 2.2% and 13.4%, respectively (hazard ratio, 5.24; 95% confidence interval, 1.50-18.26; P = .018). We developed an open-access web-based CITI calculator (https://redcap.link/citicalc) to provide an easy tool for clinical practice. The CITI score is a validated model to predict patients with MTNKN at the highest risk of developing CNS relapse.


Subject(s)
Central Nervous System Neoplasms , Humans , Central Nervous System Neoplasms/diagnosis , Central Nervous System Neoplasms/secondary , Central Nervous System Neoplasms/pathology , Central Nervous System Neoplasms/mortality , Male , Female , Middle Aged , Aged , Adult , Lymphoma, T-Cell/pathology , Lymphoma, T-Cell/diagnosis , Lymphoma, T-Cell/mortality , Prognosis , Aged, 80 and over , Neoplasm Recurrence, Local , Lymphoma, Extranodal NK-T-Cell/diagnosis , Lymphoma, Extranodal NK-T-Cell/mortality , Lymphoma, Extranodal NK-T-Cell/therapy , Risk Factors , Recurrence , Killer Cells, Natural , Young Adult
17.
Comput Struct Biotechnol J ; 23: 1945-1950, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38736693

ABSTRACT

Integrative analysis of multi-omics data has the potential to yield valuable and comprehensive insights into the molecular mechanisms underlying complex diseases such as cancer and Alzheimer's disease. However, a number of analytical challenges complicate multi-omics data integration. For instance, -omics data are usually high-dimensional, and sample sizes in multi-omics studies tend to be modest. Furthermore, when genes in an important pathway have relatively weak signal, it can be difficult to detect them individually. There is a growing body of literature on knowledge-guided learning methods that can address these challenges by incorporating biological knowledge such as functional genomics and functional proteomics into multi-omics data analysis. These methods have been shown to outperform their counterparts that do not utilize biological knowledge in tasks including prediction, feature selection, clustering, and dimension reduction. In this review, we survey recently developed methods and applications of knowledge-guided multi-omics data integration methods and discuss future research directions.

18.
BMC Vet Res ; 20(1): 191, 2024 May 11.
Article in English | MEDLINE | ID: mdl-38734611

ABSTRACT

BACKGROUND: Many proteins of African swine fever virus (ASFV, such as p72, p54, p30, CD2v, K205R) have been successfully expressed and characterized. However, there are few reports on the DP96R protein of ASFV, which is the virulence protein of ASFV and plays an important role in the process of host infection and invasion of ASFV. RESULTS: Firstly, the prokaryotic expression vector of DP96R gene was constructed, the prokaryotic system was used to induce the expression of DP96R protein, and monoclonal antibody was prepared by immunizing mice. Four monoclonal cells of DP96R protein were obtained by three ELISA screening and two sub-cloning; the titer of ascites antibody was up to 1:500,000, and the monoclonal antibody could specifically recognize DP96R protein. Finally, the subtypes of the four strains of monoclonal antibodies were identified and the minimum epitopes recognized by them were determined. CONCLUSION: Monoclonal antibody against ASFV DP96R protein was successfully prepared and identified, which lays a foundation for further exploration of the structure and function of DP96R protein and ASFV diagnostic technology.


Subject(s)
African Swine Fever Virus , Antibodies, Monoclonal , Epitopes , Mice, Inbred BALB C , Viral Proteins , Animals , Female , Mice , African Swine Fever/immunology , African Swine Fever/virology , African Swine Fever Virus/immunology , Antibodies, Monoclonal/immunology , Antibodies, Viral/immunology , Epitopes/immunology , Swine , Viral Proteins/immunology
19.
medRxiv ; 2024 May 07.
Article in English | MEDLINE | ID: mdl-38765975

ABSTRACT

Electronic health records offer great promise for early disease detection, treatment evaluation, information discovery, and other important facets of precision health. Clinical notes, in particular, may contain nuanced information about a patient's condition, treatment plans, and history that structured data may not capture. As a result, and with advancements in natural language processing, clinical notes have been increasingly used in supervised prediction models. To predict long-term outcomes such as chronic disease and mortality, it is often advantageous to leverage data occurring at multiple time points in a patient's history. However, these data are often collected at irregular time intervals and varying frequencies, thus posing an analytical challenge. Here, we propose the use of large language models (LLMs) for robust temporal harmonization of clinical notes across multiple visits. We compare multiple state-of-the-art LLMs in their ability to generate useful information during time gaps, and evaluate performance in supervised deep learning models for clinical prediction.

SELECTION OF CITATIONS
SEARCH DETAIL