Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
Biometrics ; 80(1)2024 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-38483282

RESUMEN

There is a growing body of literature on knowledge-guided statistical learning methods for analysis of structured high-dimensional data (such as genomic and transcriptomic data) that can incorporate knowledge of underlying networks derived from functional genomics and functional proteomics. These methods have been shown to improve variable selection and prediction accuracy and yield more interpretable results. However, these methods typically use graphs extracted from existing databases or rely on subject matter expertise, which are known to be incomplete and may contain false edges. To address this gap, we propose a graph-guided Bayesian modeling framework to account for network noise in regression models involving structured high-dimensional predictors. Specifically, we use 2 sources of network information, including the noisy graph extracted from existing databases and the estimated graph from observed predictors in the dataset at hand, to inform the model for the true underlying network via a latent scale modeling framework. This model is coupled with the Bayesian regression model with structured high-dimensional predictors involving an adaptive structured shrinkage prior. We develop an efficient Markov chain Monte Carlo algorithm for posterior sampling. We demonstrate the advantages of our method over existing methods in simulations, and through analyses of a genomics dataset and another proteomics dataset for Alzheimer's disease.


Asunto(s)
Enfermedad de Alzheimer , Genómica , Humanos , Teorema de Bayes , Algoritmos , Enfermedad de Alzheimer/genética , Bases de Datos Factuales
2.
Biometrics ; 80(1)2024 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-38281768

RESUMEN

There has been an increasing interest in decomposing high-dimensional multi-omics data into a product of low-rank and sparse matrices for the purpose of dimension reduction and feature engineering. Bayesian factor models achieve such low-dimensional representation of the original data through different sparsity-inducing priors. However, few of these models can efficiently incorporate the information encoded by the biological graphs, which has been already proven to be useful in many analysis tasks. In this work, we propose a Bayesian factor model with novel hierarchical priors, which incorporate the biological graph knowledge as a tool of identifying a group of genes functioning collaboratively. The proposed model therefore enables sparsity within networks by allowing each factor loading to be shrunk adaptively and by considering additional layers to relate individual shrinkage parameters to the underlying graph information, both of which yield a more accurate structure recovery of factor loadings. Further, this new priors overcome the phase transition phenomenon, in contrast to existing graph-incorporated approaches, so that it is robust to noisy edges that are inconsistent with the actual sparsity structure of the factor loadings. Finally, our model can handle both continuous and discrete data types. The proposed method is shown to outperform several existing factor analysis methods through simulation experiments and real data analyses.


Asunto(s)
Algoritmos , Teorema de Bayes , Simulación por Computador , Análisis Factorial
3.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38058188

RESUMEN

Biclustering is a useful method for simultaneously grouping samples and features and has been applied across various biomedical data types. However, most existing biclustering methods lack the ability to integratively analyze multi-modal data such as multi-omics data such as genome, transcriptome and epigenome. Moreover, the potential of leveraging biological knowledge represented by graphs, which has been demonstrated to be beneficial in various statistical tasks such as variable selection and prediction, remains largely untapped in the context of biclustering. To address both, we propose a novel Bayesian biclustering method called Bayesian graph-guided biclustering (BGB). Specifically, we introduce a new hierarchical sparsity-inducing prior to effectively incorporate biological graph information and establish a unified framework to model multi-view data. We develop an efficient Markov chain Monte Carlo algorithm to conduct posterior sampling and inference. Extensive simulations and real data analysis show that BGB outperforms other popular biclustering methods. Notably, BGB is robust in terms of utilizing biological knowledge and has the capability to reveal biologically meaningful information from heterogeneous multi-modal data.


Asunto(s)
Algoritmos , Multiómica , Teorema de Bayes , Análisis por Conglomerados , Transcriptoma
4.
J Am Stat Assoc ; 118(543): 1473-1487, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37982009

RESUMEN

With distinct advantages in power over behavioral phenotypes, brain imaging traits have become emerging endophenotypes to dissect molecular contributions to behaviors and neuropsychiatric illnesses. Among different imaging features, brain structural connectivity (i.e., structural connectome) which summarizes the anatomical connections between different brain regions is one of the most cutting edge while under-investigated traits; and the genetic influence on the structural connectome variation remains highly elusive. Relying on a landmark imaging genetics study for young adults, we develop a biologically plausible brain network response shrinkage model to comprehensively characterize the relationship between high dimensional genetic variants and the structural connectome phenotype. Under a unified Bayesian framework, we accommodate the topology of brain network and biological architecture within the genome; and eventually establish a mechanistic mapping between genetic biomarkers and the associated brain sub-network units. An efficient expectation-maximization algorithm is developed to estimate the model and ensure computing feasibility. In the application to the Human Connectome Project Young Adult (HCP-YA) data, we establish the genetic underpinnings which are highly interpretable under functional annotation and brain tissue eQTL analysis, for the brain white matter tracts connecting the hippocampus and two cerebral hemispheres. We also show the superiority of our method in extensive simulations.

5.
Biostatistics ; 2023 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-37494883

RESUMEN

Radionuclide imaging plays a critical role in the diagnosis and management of kidney obstruction. However, most practicing radiologists in US hospitals have insufficient time and resources to acquire training and experience needed to interpret radionuclide images, leading to increased diagnostic errors. To tackle this problem, Emory University embarked on a study that aims to develop a computer-assisted diagnostic (CAD) tool for kidney obstruction by mining and analyzing patient data comprised of renogram curves, ordinal expert ratings on the obstruction status, pharmacokinetic variables, and demographic information. The major challenges here are the heterogeneity in data modes and the lack of gold standard for determining kidney obstruction. In this article, we develop a statistically principled CAD tool based on an integrative latent class model that leverages heterogeneous data modalities available for each patient to provide accurate prediction of kidney obstruction. Our integrative model consists of three sub-models (multilevel functional latent factor regression model, probit scalar-on-function regression model, and Gaussian mixture model), each of which is tailored to the specific data mode and depends on the unknown obstruction status (latent class). An efficient MCMC algorithm is developed to train the model and predict kidney obstruction with associated uncertainty. Extensive simulations are conducted to evaluate the performance of the proposed method. An application to an Emory renal study demonstrates the usefulness of our model as a CAD tool for kidney obstruction.

6.
Stat Anal Data Min ; 16(2): 120-134, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37213790

RESUMEN

Integrative learning of multiple datasets has the potential to mitigate the challenge of small n and large p that is often encountered in analysis of big biomedical data such as genomics data. Detection of weak yet important signals can be enhanced by jointly selecting features for all datasets. However, the set of important features may not always be the same across all datasets. Although some existing integrative learning methods allow heterogeneous sparsity structure where a subset of datasets can have zero coefficients for some selected features, they tend to yield reduced efficiency, reinstating the problem of losing weak important signals. We propose a new integrative learning approach which can not only aggregate important signals well in homogeneous sparsity structure, but also substantially alleviate the problem of losing weak important signals in heterogeneous sparsity structure. Our approach exploits a priori known graphical structure of features and encourages joint selection of features that are connected in the graph. Integrating such prior information over multiple datasets enhances the power, while also accounting for the heterogeneity across datasets. Theoretical properties of the proposed method are investigated. We also demonstrate the limitations of existing approaches and the superiority of our method using a simulation study and analysis of gene expression data from ADNI.

7.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36882008

RESUMEN

MOTIVATION: With the rapid development of modern technologies, massive data are available for the systematic study of Alzheimer's disease (AD). Though many existing AD studies mainly focus on single-modality omics data, multi-omics datasets can provide a more comprehensive understanding of AD. To bridge this gap, we proposed a novel structural Bayesian factor analysis framework (SBFA) to extract the information shared by multi-omics data through the aggregation of genotyping data, gene expression data, neuroimaging phenotypes and prior biological network knowledge. Our approach can extract common information shared by different modalities and encourage biologically related features to be selected, guiding future AD research in a biologically meaningful way. METHOD: Our SBFA model decomposes the mean parameters of the data into a sparse factor loading matrix and a factor matrix, where the factor matrix represents the common information extracted from multi-omics and imaging data. Our framework is designed to incorporate prior biological network information. Our simulation study demonstrated that our proposed SBFA framework could achieve the best performance compared with the other state-of-the-art factor-analysis-based integrative analysis methods. RESULTS: We apply our proposed SBFA model together with several state-of-the-art factor analysis models to extract the latent common information from genotyping, gene expression and brain imaging data simultaneously from the ADNI biobank database. The latent information is then used to predict the functional activities questionnaire score, an important measurement for diagnosis of AD quantifying subjects' abilities in daily life. Our SBFA model shows the best prediction performance compared with the other factor analysis models. AVAILABILITY: Code are publicly available at https://github.com/JingxuanBao/SBFA. CONTACT: qlong@upenn.edu.


Asunto(s)
Multiómica , Neuroimagen , Teorema de Bayes , Neuroimagen/métodos , Encéfalo/diagnóstico por imagen , Fenotipo
8.
Biometrics ; 79(3): 2357-2369, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-36305019

RESUMEN

Electronic health records (EHRs) offer great promises for advancing precision medicine and, at the same time, present significant analytical challenges. Particularly, it is often the case that patient-level data in EHRs cannot be shared across institutions (data sources) due to government regulations and/or institutional policies. As a result, there are growing interests about distributed learning over multiple EHRs databases without sharing patient-level data. To tackle such challenges, we propose a novel communication efficient method that aggregates the optimal estimates of external sites, by turning the problem into a missing data problem. In addition, we propose incorporating posterior samples of remote sites, which can provide partial information on the missing quantities and improve efficiency of parameter estimates while having the differential privacy property and thus reducing the risk of information leaking. The proposed approach, without sharing the raw patient level data, allows for proper statistical inference. We provide theoretical investigation for the asymptotic properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses in comparison with several recently developed methods.


Asunto(s)
Registros Electrónicos de Salud , Privacidad , Humanos , Bases de Datos Factuales , Análisis de Datos , Comunicación
9.
Biostatistics ; 24(1): 161-176, 2022 12 12.
Artículo en Inglés | MEDLINE | ID: mdl-34520533

RESUMEN

Single-cell RNA-sequencing (scRNAseq) data contain a high level of noise, especially in the form of zero-inflation, that is, the presence of an excessively large number of zeros. This is largely due to dropout events and amplification biases that occur in the preparation stage of single-cell experiments. Recent scRNAseq experiments have been augmented with unique molecular identifiers (UMI) and External RNA Control Consortium (ERCC) molecules which can be used to account for zero-inflation. However, most of the current methods on graphical models are developed under the assumption of the multivariate Gaussian distribution or its variants, and thus they are not able to adequately account for an excessively large number of zeros in scRNAseq data. In this article, we propose a single-cell latent graphical model (scLGM)-a Bayesian hierarchical model for estimating the conditional dependency network among genes using scRNAseq data. Taking advantage of UMI and ERCC data, scLGM explicitly models the two sources of zero-inflation. Our simulation study and real data analysis demonstrate that the proposed approach outperforms several existing methods.


Asunto(s)
ARN , Análisis de la Célula Individual , Humanos , Análisis de Secuencia de ARN/métodos , Teorema de Bayes , ARN/genética , Simulación por Computador
10.
Stat Med ; 40(22): 4772-4793, 2021 09 30.
Artículo en Inglés | MEDLINE | ID: mdl-34102703

RESUMEN

Existing missing data methods for functional data mainly focus on reconstructing missing measurements along a single function-a univariate functional data setting. Motivated by a renal study, we focus on a bivariate functional data setting, where each sampling unit is a collection of two distinct component functions, one of which may be missing. Specifically, we propose a Bayesian multiple imputation approach based on a bivariate functional latent factor model that exploits the joint changing patterns of the component functions to allow accurate and stable imputation of one component given the other. We further extend the framework to address multilevel bivariate functional data with missing components by modeling and exploiting inter-component and intra-subject correlations. We develop a Gibbs sampling algorithm that simultaneously generates multiple imputations of missing component functions and posterior samples of model parameters. For multilevel bivariate functional data, a partially collapsed Gibbs sampler is implemented to improve computational efficiency. Our simulation study demonstrates that our methods outperform other competing methods for imputing missing components of bivariate functional data under various designs and missingness rates. The motivating renal study aims to investigate the distribution and pharmacokinetic properties of baseline and post-furosemide renogram curves that provide further insights into the underlying mechanism of renal obstruction, with post-furosemide renogram curves missing for some subjects. We apply the proposed methods to impute missing post-furosemide renogram curves and obtain more refined insights.


Asunto(s)
Algoritmos , Teorema de Bayes , Simulación por Computador , Interpretación Estadística de Datos , Humanos
11.
Sci Rep ; 11(1): 5146, 2021 03 04.
Artículo en Inglés | MEDLINE | ID: mdl-33664338

RESUMEN

Multi-modal molecular profiling data in bulk tumors or single cells are accumulating at a fast pace. There is a great need for developing statistical and computational methods to reveal molecular structures in complex data types toward biological discoveries. Here, we introduce Nebula, a novel Bayesian integrative clustering analysis for high dimensional multi-modal molecular data to identify directly interpretable clusters and associated biomarkers in a unified and biologically plausible framework. To facilitate computational efficiency, a variational Bayes approach is developed to approximate the joint posterior distribution to achieve model inference in high-dimensional settings. We describe a pan-cancer data analysis of genomic, epigenomic, and transcriptomic alterations in close to 9000 tumor samples across canonical oncogenic signaling pathways, immune and stemness phenotype, with comparisons to state-of-the-art clustering methods. We demonstrate that Nebula has the unique advantage of revealing patterns on the basis of shared pathway alterations, offering biological and clinical insights beyond tumor type and histology in the pan-cancer analysis setting. We also illustrate the utility of Nebula in single cell data for immune cell decomposition in peripheral blood samples.


Asunto(s)
Carcinogénesis/genética , Biología Computacional/estadística & datos numéricos , Genómica/estadística & datos numéricos , Neoplasias/genética , Teorema de Bayes , Análisis por Conglomerados , Epigenómica , Humanos , Modelos Estadísticos , Neoplasias/patología , Transcriptoma/genética
12.
Proc IEEE Int Conf Big Data ; 2021: 4472-4479, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-35187547

RESUMEN

Support vector machine (SVM) is a popular classification method for the analysis of a wide range of data including big biomedical data. Many SVM methods with feature selection have been developed under the frequentist regularization or Bayesian shrinkage frameworks. On the other hand, the value of incorporating a priori known biological knowledge, such as those from functional genomics and functional proteomics, into statistical analysis of -omic data has been recognized in recent years. Such biological information is often represented by graphs. We propose a novel method that assigns Laplace priors to the regression coefficients and incorporates the underlying graph information via a hyper-prior for the shrinkage parameters in the Laplace priors. This enables smoothing of shrinkage parameters for connected variables in the graph and conditional independence between shrinkage parameters for disconnected variables. Extensive simulations demonstrate that our proposed methods achieve the best performance compared to the other existing SVM methods in terms of prediction accuracy. The proposed method are also illustrated in analysis of genomic data from cancer studies, demonstrating its advantage in generating biologically meaningful results and identifying potentially important features.

13.
Nat Commun ; 11(1): 5467, 2020 10 29.
Artículo en Inglés | MEDLINE | ID: mdl-33122624

RESUMEN

Distributed health data networks (DHDNs) leverage data from multiple sources or sites such as electronic health records (EHRs) from multiple healthcare systems and have drawn increasing interests in recent years, as they do not require sharing of subject-level data and hence lower the hurdles for collaboration between institutions considerably. However, DHDNs face a number of challenges in data analysis, particularly in the presence of missing data. The current state-of-the-art methods for handling incomplete data require pooling data into a central repository before analysis, which is not feasible in DHDNs. In this paper, we address the missing data problem in distributed environments such as DHDNs that has not been investigated previously. We develop communication-efficient distributed multiple imputation methods for incomplete data that are horizontally partitioned. Since subject-level data are not shared or transferred outside of each site in the proposed methods, they enhance protection of patient privacy and have the potential to strengthen public trust in analysis of sensitive health data. We investigate, through extensive simulation studies, the performance of these methods. Our methods are applied to the analysis of an acute stroke dataset collected from multiple hospitals, mimicking a DHDN where health data are horizontally partitioned across hospitals and subject-level data cannot be shared or sent to a central data repository.


Asunto(s)
Análisis de Datos , Registros Electrónicos de Salud , Atención a la Salud/estadística & datos numéricos , Humanos
14.
Am J Otolaryngol ; 41(6): 102694, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32854041

RESUMEN

PURPOSE: Head and neck surgeons are among the highest risk for COVID-19 exposure, which also brings great risk to their mental wellbeing. In this study, we aim to evaluate mental health symptoms among head and neck surgeons in Brazil surrounding the time it was declared the epicenter of the virus. MATERIALS AND METHODS: A cross-sectional, survey-based study evaluating burnout, anxiety, distress, and depression among head and neck surgeons in Brazil, assessed through the single-item Mini-Z burnout assessment, 7-item Generalized Anxiety Disorder scale, 22-item Impact of Event Scale-Revised, and 2-item Patient Health Questionnaire, respectively. RESULTS: 163 physicians completed the survey (74.2% males). Anxiety, distress, burnout, and depression symptoms were reported in 74 (45.5%), 43 (26.3%), 24 (14.7%), and 26 (16.0%) physicians, respectively. On multivariable analysis, female physicians were more likely to report a positive screening for burnout compared to males (OR 2.88, CI [1.07-7.74]). Physicians 45 years or older were less likely to experience anxiety symptoms than those younger than 45 years (OR 0.40, CI [0.20-0.81]). Physicians with no self-reported prior psychiatric conditions were less likely to have symptoms of distress compared to those with such history (OR 0.11, CI [0.33-0.38]). CONCLUSION: Head and neck surgeons in Brazil reported symptoms of burnout, anxiety, distress and depression during our study period within the COVID-19 pandemic. Institutions should monitor these symptoms throughout the pandemic. Further study is required to assess the long-term implications for physician wellness.


Asunto(s)
Ansiedad/epidemiología , Agotamiento Profesional/epidemiología , Infecciones por Coronavirus/epidemiología , Depresión/epidemiología , Estrés Laboral/epidemiología , Otorrinolaringólogos/psicología , Neumonía Viral/epidemiología , Cirujanos/psicología , Adulto , Factores de Edad , Anciano , Betacoronavirus , Brasil/epidemiología , COVID-19 , Estudios Transversales , Femenino , Humanos , Masculino , Persona de Mediana Edad , Pandemias , SARS-CoV-2 , Factores Sexuales , Estrés Psicológico/epidemiología , Encuestas y Cuestionarios
15.
OTO Open ; 4(3): 2473974X20948835, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32839747

RESUMEN

OBJECTIVE: Nonphysician health care workers are involved in high-risk patient care during the COVID-19 pandemic, placing them at high risk of mental health burden. The mental health impact of COVID-19 in this crucial population has not been studied thus far. Thus, the objective of this study is to assess the psychosocial well-being of these providers. STUDY DESIGN: National cross-sectional online survey (no control group). SETTING: Academic otolaryngology programs in the United States. SUBJECTS AND METHODS: We distributed a survey to nonphysician health care workers in otolaryngology departments across the United States. The survey incorporated a variety of validated mental health assessment tools to measure participant burnout (Mini-Z assessment), anxiety (Generalized Anxiety Disorder-7), distress (Impact of Event Scale), and depression (Patient Health Questionnaire-2). Multivariable logistic regression analysis was performed to determine predictive factors associated with these mental health outcomes. RESULTS: We received 347 survey responses: 248 (71.5%) nurses, 63 (18.2%) administrative staff, and 36 (10.4%) advanced practice providers. A total of 104 (30.0%) respondents reported symptoms of burnout; 241 (69.5%), symptoms of anxiety; 292 (84.1%), symptoms of at least mild distress; and 79 (22.8%), symptoms of depression. Upon further analysis, development of these symptoms was associated with factors such as occupation, practice setting, and case load. CONCLUSION: Frontline otolaryngology health care providers exhibit high rates of mental health complications, particularly anxiety and distress, in the wake of COVID-19. Adequate support systems must be put into place to address these issues.

16.
Head Neck ; 42(7): 1597-1609, 2020 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-32496637

RESUMEN

BACKGROUND: Otolaryngologists are among the highest risk for COVID-19 exposure. METHODS: This is a cross-sectional, survey-based, national study evaluating academic otolaryngologists. Burnout, anxiety, distress, and depression were assessed by the single-item Mini-Z Burnout Assessment, 7-item Generalized Anxiety Disorder Scale, 15-item Impact of Event Scale, and 2-item Patient Health Questionnaire, respectively. RESULTS: A total of 349 physicians completed the survey. Of them, 165 (47.3%) were residents and 212 (60.7%) were males. Anxiety, distress, burnout, and depression were reported in 167 (47.9%), 210 (60.2%), 76 (21.8%), and 37 (10.6%) physicians, respectively. Attendings had decreased burnout relative to residents (odds ratio [OR] 0.28, confidence interval [CI] [0.11-0.68]; P = .005). Females had increased burnout (OR 1.93, CI [1.12.-3.32]; P = .018), anxiety (OR 2.53, CI [1.59-4.02]; P < .005), and distress (OR 2.68, CI [1.64-4.37]; P < .005). Physicians in states with greater than 20 000 positive cases had increased distress (OR 2.01, CI [1.22-3.31]; P = .006). CONCLUSION: During the COVID-19 pandemic, the prevalence of burnout, anxiety, and distress is high among academic otolaryngologists.


Asunto(s)
Infecciones por Coronavirus/epidemiología , Internado y Residencia , Cuerpo Médico de Hospitales/psicología , Otorrinolaringólogos/psicología , Neumonía Viral/epidemiología , Adulto , Ansiedad/epidemiología , Betacoronavirus , Agotamiento Profesional/epidemiología , COVID-19 , Estudios Transversales , Depresión/epidemiología , Femenino , Humanos , Masculino , Cuerpo Médico de Hospitales/estadística & datos numéricos , Otorrinolaringólogos/estadística & datos numéricos , Pandemias , SARS-CoV-2 , Factores Sexuales , Estrés Psicológico/epidemiología , Encuestas y Cuestionarios , Estados Unidos/epidemiología
17.
Proc SIAM Int Conf Data Min ; 2020: 604-612, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32440369

RESUMEN

Integrative analysis jointly analyzes multiple data sets to overcome curse of dimensionality. It can detect important but weak signals by jointly selecting features for all data sets, but unfortunately the sets of important features are not always the same for all data sets. Variations which allows heterogeneous sparsity structure-a subset of data sets can have a zero coefficient for a selected feature-have been proposed, but it compromises the effect of integrative analysis recalling the problem of losing weak important signals. We propose a new integrative analysis approach which not only aggregates weak important signals well in homogeneity setting but also substantially alleviates the problem of losing weak important signals in heterogeneity setting. Our approach exploits a priori known graphical structure of features by forcing joint selection of adjacent features, and integrating such information over multiple data sets can increase the power while taking into account the heterogeneity across data sets. We confirm the problem of existing approaches and demonstrate the superiority of our method through a simulation study and an application to gene expression data from ADNI.

18.
J Am Stat Assoc ; 115(532): 1645-1663, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-34113054

RESUMEN

Kidney obstruction, if untreated in a timely manner, can lead to irreversible loss of renal function. A widely used technology for evaluations of kidneys with suspected obstruction is diuresis renography. However, it is generally very challenging for radiologists who typically interpret renography data in practice to build high level of competency due to the low volume of renography studies and insufficient training. Another challenge is that there is currently no gold standard for detection of kidney obstruction. Seeking to develop a computer-aided diagnostic (CAD) tool that can assist practicing radiologists to reduce errors in the interpretation of kidney obstruction, a recent study collected data from diuresis renography, interpretations on the renography data from highly experienced nuclear medicine experts as well as clinical data. To achieve the objective, we develop a statistical model that can be used as a CAD tool for assisting radiologists in kidney interpretation. We use a Bayesian latent class modeling approach for predicting kidney obstruction through the integrative analysis of time-series renogram data, expert ratings, and clinical variables. A nonparametric Bayesian latent factor regression approach is adopted for modeling renogram curves in which the coefficients of the basis functions are parameterized via the factor loadings dependent on the latent disease status and the extended latent factors that can also adjust for clinical variables. A hierarchical probit model is used for expert ratings, allowing for training with rating data from multiple experts while predicting with at most one expert, which makes the proposed model operable in practice. An efficient MCMC algorithm is developed to train the model and predict kidney obstruction with associated uncertainty. We demonstrate the superiority of the proposed method over several existing methods through extensive simulations. Analysis of the renal study also lends support to the usefulness of our model as a CAD tool to assist less experienced radiologists in the field.

19.
Biostatistics ; 21(3): 610-624, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-30596887

RESUMEN

Biclustering techniques can identify local patterns of a data matrix by clustering feature space and sample space at the same time. Various biclustering methods have been proposed and successfully applied to analysis of gene expression data. While existing biclustering methods have many desirable features, most of them are developed for continuous data and few of them can efficiently handle -omics data of various types, for example, binomial data as in single nucleotide polymorphism data or negative binomial data as in RNA-seq data. In addition, none of existing methods can utilize biological information such as those from functional genomics or proteomics. Recent work has shown that incorporating biological information can improve variable selection and prediction performance in analyses such as linear regression and multivariate analysis. In this article, we propose a novel Bayesian biclustering method that can handle multiple data types including Gaussian, Binomial, and Negative Binomial. In addition, our method uses a Bayesian adaptive structured shrinkage prior that enables feature selection guided by existing biological information. Our simulation studies and application to multi-omics datasets demonstrate robust and superior performance of the proposed method, compared to other existing biclustering methods.


Asunto(s)
Bioestadística/métodos , Biología Computacional/métodos , Modelos Biológicos , Modelos Estadísticos , Teorema de Bayes , Análisis por Conglomerados , Simulación por Computador , Conjuntos de Datos como Asunto , Genómica , Humanos
20.
10th IEEE Int Conf Big Knowl (2019) ; 2019: 25-32, 2019 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-34290493

RESUMEN

A biclustering in the analysis of a gene expression data matrix, for example, is defined as a set of biclusters where each bicluster is a group of genes and a group of samples for which the genes are differentially expressed. Although many data mining approaches for biclustering exist in the literature, only few are able to incorporate prior knowledge to the analysis, which can lead to great improvements in terms of accuracy and interpretability, and all are limited in handling discrete data types. We propose a generalized biclustering approach that can be used for integrative analysis of multi-omics data with different data types. Our method is capable of utilizing biological information that can be represented by graph such as functional genomics and functional proteomics and accommodating a combination of continuous and discrete data types. The proposed method builds on a generalized Bayesian factor analysis framework and a variational EM approach is used to obtain parameter estimates, where the latent quantities in the loglikelihood are iteratively imputed by their conditional expectations. The biclusters are retrieved via the sparse estimates of the factor loadings and the conditional expectation of the latent factors. In order to obtain the sparse conditional expectation of the latent factors, a novel sparse variational EM algorithm is used. We demonstrate the superiority of our method over several existing biclustering methods in extensive simulation experiements and in integrative analysis of multi-omics data.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA