RESUMEN
Bayes factors represent a useful alternative to P-values for reporting outcomes of hypothesis tests by providing direct measures of the relative support that data provide to competing hypotheses. Unfortunately, the competing hypotheses have to be specified, and the calculation of Bayes factors in high-dimensional settings can be difficult. To address these problems, we define Bayes factor functions (BFFs) directly from common test statistics. BFFs depend on a single noncentrality parameter that can be expressed as a function of standardized effects, and plots of BFFs versus effect size provide informative summaries of hypothesis tests that can be easily aggregated across studies. Such summaries eliminate the need for arbitrary P-value thresholds to define "statistical significance." Because BFFs are defined using nonlocal alternative prior densities, they provide more rapid accumulation of evidence in favor of true null hypotheses without sacrificing efficiency in supporting true alternative hypotheses. BFFs can be expressed in closed form and can be computed easily from z, t, χ2, and F statistics.
Asunto(s)
Proyectos de Investigación , Teorema de BayesRESUMEN
Brucellosis is a major public health concern worldwide, especially for persons living in resource-limited settings. Historically, an evidence-based estimate of the global annual incidence of human cases has been elusive. We used international public health data to fill this information gap through application of risk metrics to worldwide and regional at-risk populations. We performed estimations using 3 statistical models (weighted average interpolation, bootstrap resampling, and Bayesian inference) and considered missing information. An evidence-based conservative estimate of the annual global incidence is 2.1 million, significantly higher than was previously assumed. Our models indicate Africa and Asia sustain most of the global risk and cases, although areas within the Americas and Europe remain of concern. This study reveals that disease risk and incidence are higher than previously suggested and lie mainly within resource-limited settings. Clarification of both misdiagnosis and underdiagnosis is required because those factors will amplify case estimates.
Asunto(s)
Brucelosis , Humanos , Teorema de Bayes , Incidencia , África , Asia , Brucelosis/epidemiologíaRESUMEN
De novo mutations (DNMs) are increasingly recognized as rare disease causal factors. Identifying DNM carriers will allow researchers to study the likely distinct molecular mechanisms of DNMs. We developed Famdenovo to predict DNM status (DNM or familial mutation [FM]) of deleterious autosomal dominant germline mutations for any syndrome. We introduce Famdenovo.TP53 for Li-Fraumeni syndrome (LFS) and analyze 324 LFS family pedigrees from four US cohorts: a validation set of 186 pedigrees and a discovery set of 138 pedigrees. The concordance index for Famdenovo.TP53 prediction was 0.95 (95% CI: [0.92, 0.98]). Forty individuals (95% CI: [30, 50]) were predicted as DNM carriers, increasing the total number from 42 to 82. We compared clinical and biological features of FM versus DNM carriers: (1) cancer and mutation spectra along with parental ages were similarly distributed; (2) ascertainment criteria like early-onset breast cancer (age 20-35 yr) provides a condition for an unbiased estimate of the DNM rate: 48% (23 DNMs vs. 25 FMs); and (3) hotspot mutation R248W was not observed in DNMs, although it was as prevalent as hotspot mutation R248Q in FMs. Furthermore, we introduce Famdenovo.BRCA for hereditary breast and ovarian cancer syndrome and apply it to a small set of family data from the Cancer Genetics Network. In summary, we introduce a novel statistical approach to systematically evaluate deleterious DNMs in inherited cancer syndromes. Our approach may serve as a foundation for future studies evaluating how new deleterious mutations can be established in the germline, such as those in TP53.
Asunto(s)
Neoplasias de la Mama/genética , Predisposición Genética a la Enfermedad/genética , Mutación de Línea Germinal/genética , Síndrome de Li-Fraumeni/genética , Neoplasias Ováricas/genética , Adulto , Proteína BRCA1/genética , Proteína BRCA2/genética , Neoplasias de la Mama/diagnóstico , Familia , Femenino , Humanos , Linaje , Proteína p53 Supresora de Tumor/genética , Adulto JovenRESUMEN
Motivation: Multiple marker analysis of the genome-wide association study (GWAS) data has gained ample attention in recent years. However, because of the ultra high-dimensionality of GWAS data, such analysis is challenging. Frequently used penalized regression methods often lead to large number of false positives, whereas Bayesian methods are computationally very expensive. Motivated to ameliorate these issues simultaneously, we consider the novel approach of using non-local priors in an iterative variable selection framework. Results: We develop a variable selection method, named, iterative non-local prior based selection for GWAS, or GWASinlps, that combines, in an iterative variable selection framework, the computational efficiency of the screen-and-select approach based on some association learning and the parsimonious uncertainty quantification provided by the use of non-local priors. The hallmark of our method is the introduction of 'structured screen-and-select' strategy, that considers hierarchical screening, which is not only based on response-predictor associations, but also based on response-response associations and concatenates variable selection within that hierarchy. Extensive simulation studies with single nucleotide polymorphisms having realistic linkage disequilibrium structures demonstrate the advantages of our computationally efficient method compared to several frequentist and Bayesian variable selection methods, in terms of true positive rate, false discovery rate, mean squared error and effect size estimation error. Further, we provide empirical power analysis useful for study design. Finally, a real GWAS data application was considered with human height as phenotype. Availability and implementation: An R-package for implementing the GWASinlps method is available at https://cran.r-project.org/web/packages/GWASinlps/index.html. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Estudio de Asociación del Genoma Completo , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Programas Informáticos , Teorema de Bayes , Biología Computacional , Humanos , Análisis de RegresiónRESUMEN
Bayesian model selection procedures based on nonlocal alternative prior densities are extended to ultrahigh dimensional settings and compared to other variable selection procedures using precision-recall curves. Variable selection procedures included in these comparisons include methods based on g-priors, reciprocal lasso, adaptive lasso, scad, and minimax concave penalty criteria. The use of precision-recall curves eliminates the sensitivity of our conclusions to the choice of tuning parameters. We find that Bayesian variable selection procedures based on nonlocal priors are competitive to all other procedures in a range of simulation scenarios, and we subsequently explain this favorable performance through a theoretical examination of their consistency properties. When certain regularity conditions apply, we demonstrate that the nonlocal procedures are consistent for linear models even when the number of covariates p increases sub-exponentially with the sample size n. A model selection procedure based on Zellner's g-prior is also found to be competitive with penalized likelihood methods in identifying the true model, but the posterior distribution on the model space induced by this method is much more dispersed than the posterior distribution induced on the model space by the nonlocal prior methods. We investigate the asymptotic form of the marginal likelihood based on the nonlocal priors and show that it attains a unique term that cannot be derived from the other Bayesian model selection procedures. We also propose a scalable and efficient algorithm called Simplified Shotgun Stochastic Search with Screening (S5) to explore the enormous model space, and we show that S5 dramatically reduces the computing time without losing the capacity to search the interesting region in the model space, at least in the simulation settings considered. The S5 algorithm is available in an R package BayesS5 on CRAN.
RESUMEN
We propose a Bayesian phase I/II dose-finding trial design that simultaneously accounts for toxicity and efficacy. We model the toxicity and efficacy of investigational doses using a flexible Bayesian dynamic model, which borrows information across doses without imposing stringent parametric assumptions on the shape of the dose-toxicity and dose-efficacy curves. An intuitive utility function that reflects the desirability trade-offs between efficacy and toxicity is used to guide the dose assignment and selection. We also discuss the extension of this design to handle delayed toxicity and efficacy. We conduct extensive simulation studies to examine the operating characteristics of the proposed method under various practical scenarios. The results show that the proposed design possesses good operating characteristics and is robust to the shape of the dose-toxicity and dose-efficacy curves.
Asunto(s)
Teorema de Bayes , Ensayos Clínicos Fase I como Asunto , Ensayos Clínicos Fase II como Asunto , Relación Dosis-Respuesta a Droga , Proyectos de Investigación , HumanosRESUMEN
MOTIVATION: The advent of new genomic technologies has resulted in the production of massive data sets. Analyses of these data require new statistical and computational methods. In this article, we propose one such method that is useful in selecting explanatory variables for prediction of a binary response. Although this problem has recently been addressed using penalized likelihood methods, we adopt a Bayesian approach that utilizes a mixture of non-local prior densities and point masses on the binary regression coefficient vectors. RESULTS: The resulting method, which we call iMOMLogit, provides improved performance in identifying true models and reducing estimation and prediction error in a number of simulation studies. More importantly, its application to several genomic datasets produces predictions that have high accuracy using far fewer explanatory variables than competing methods. We also describe a novel approach for setting prior hyperparameters by examining the total variation distance between the prior distributions on the regression parameters and the distribution of the maximum likelihood estimator under the null distribution. Finally, we describe a computational algorithm that can be used to implement iMOMLogit in ultrahigh-dimensional settings ([Formula: see text]) and provide diagnostics to assess the probability that this algorithm has identified the highest posterior probability model. AVAILABILITY AND IMPLEMENTATION: Software to implement this method can be downloaded at: http://www.stat.tamu.edu/â¼amir/code.html CONTACT: wwang7@mdanderson.org or vjohnson@stat.tamu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genómica , Programas Informáticos , Algoritmos , Animales , Teorema de Bayes , Humanos , Funciones de VerosimilitudRESUMEN
Recent advances in Bayesian hypothesis testing have led to the development of uniformly most powerful Bayesian tests, which represent an objective, default class of Bayesian hypothesis tests that have the same rejection regions as classical significance tests. Based on the correspondence between these two classes of tests, it is possible to equate the size of classical hypothesis tests with evidence thresholds in Bayesian tests, and to equate P values with Bayes factors. An examination of these connections suggest that recent concerns over the lack of reproducibility of scientific studies can be attributed largely to the conduct of significance tests at unjustifiably high levels of significance. To correct this problem, evidence thresholds required for the declaration of a significant finding should be increased to 25-50:1, and to 100-200:1 for the declaration of a highly significant finding. In terms of classical hypothesis tests, these evidence standards mandate the conduct of tests at the 0.005 or 0.001 level of significance.
Asunto(s)
Reproducibilidad de los Resultados , Estadística como Asunto/normas , Teorema de Bayes , Modelos EstadísticosRESUMEN
To evaluate the utility of automated deformable image registration (DIR) algorithms, it is necessary to evaluate both the registration accuracy of the DIR algorithm itself, as well as the registration accuracy of the human readers from whom the "gold standard" is obtained. We propose a Bayesian hierarchical model to evaluate the spatial accuracy of human readers and automatic DIR methods based on multiple image registration data generated by human readers and automatic DIR methods. To fully account for the locations of landmarks in all images, we treat the true locations of landmarks as latent variables and impose a hierarchical structure on the magnitude of registration errors observed across image pairs. DIR registration errors are modeled using Gaussian processes with reference prior densities on prior parameters that determine the associated covariance matrices. We develop a Gibbs sampling algorithm to efficiently fit our models to high-dimensional data, and apply the proposed method to analyze an image dataset obtained from a 4D thoracic CT study.
Asunto(s)
Algoritmos , Interpretación de Imagen Asistida por Computador/métodos , Modelos Estadísticos , Teorema de Bayes , Biometría/métodos , Simulación por Computador , Testimonio de Experto , Tomografía Computarizada Cuatridimensional/estadística & datos numéricos , Humanos , Distribución NormalRESUMEN
OBJECTIVE: The purpose of this study was to develop a method of measuring rectal radiation dose in vivo during CT colonography (CTC) and assess the accuracy of size-specific dose estimates (SSDEs) relative to that of in vivo dose measurements. MATERIALS AND METHODS: Thermoluminescent dosimeter capsules were attached to a CTC rectal catheter to obtain four measurements of the CT radiation dose in 10 volunteers (five men and five women; age range, 23-87 years; mean age, 70.4 years). A fixed CT technique (supine and prone, 50 mAs and 120 kVp each) was used for CTC. SSDEs and percentile body habitus measurements were based on CT images and directly compared with in vivo dose measurements. RESULTS: The mean absorbed doses delivered to the rectum ranged from 8.8 to 23.6 mGy in the 10 patients, whose mean body habitus was in the 27th percentile among American adults 18-64 years old (range, 0.5-67th percentile). The mean SSDE error was 7.2% (range, 0.6-31.4%). CONCLUSION: This in vivo radiation dose measurement technique can be applied to patients undergoing CTC. Our measurements indicate that SSDEs are reasonable estimates of the rectal absorbed dose. The data obtained in this pilot study can be used as benchmarks for assessing dose estimates using other indirect methods (e.g., Monte Carlo simulations).
Asunto(s)
Colonografía Tomográfica Computarizada , Dosis de Radiación , Recto/efectos de la radiación , Dosimetría Termoluminiscente/instrumentación , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Masculino , Persona de Mediana Edad , Método de Montecarlo , Proyectos PilotoRESUMEN
Uniformly most powerful tests are statistical hypothesis tests that provide the greatest power against a fixed null hypothesis among all tests of a given size. In this article, the notion of uniformly most powerful tests is extended to the Bayesian setting by defining uniformly most powerful Bayesian tests to be tests that maximize the probability that the Bayes factor, in favor of the alternative hypothesis, exceeds a specified threshold. Like their classical counterpart, uniformly most powerful Bayesian tests are most easily defined in one-parameter exponential family models, although extensions outside of this class are possible. The connection between uniformly most powerful tests and uniformly most powerful Bayesian tests can be used to provide an approximate calibration between p-values and Bayes factors. Finally, issues regarding the strong dependence of resulting Bayes factors and p-values on sample size are discussed.
RESUMEN
PURPOSE: Patient-reported outcomes (PROs) have been found to be significant predictors of clinical outcomes such as overall survival (OS), but the effect of demographic and clinical factors on the prognostic ability of PROs is less understood. Several PROs derived from the 12-item Short-Form Health Survey (SF-12) and M. D. Anderson Symptom Inventory (MDASI) were investigated for association with OS, with adjustments for other factors, including performance status. METHODS: A retrospective analysis was performed on data from 90 patients with stage IV non-small cell lung cancer. Several baseline PROs were added to a base Cox proportional hazards model to examine the marginal significance and improvement in model fit attributable to the PRO: mean MDASI symptom interference level; mean MDASI symptom severity level for five selected symptoms; SF-12 physical and mental component summaries; and the SF-12 general health item. Bootstrap resampling was used to assess the robustness of the findings. RESULTS: The MDASI mean interference level had a significant effect on OS (p = 0.007) when the model was not adjusted for interactions with other prognostic factors. Further exploration suggested the significance was due to an interaction with performance status (p = 0.001). The MDASI mean symptom severity level and the SF-12 physical component summary, mental component summary, and general health item did not have a significant effect on OS. CONCLUSIONS: Symptom interference adds prognostic information for OS in advanced lung cancer patients with poor performance status, even when demographic and clinical prognostic factors are accounted for.
Asunto(s)
Indicadores de Salud , Evaluación del Resultado de la Atención al Paciente , Calidad de Vida , Evaluación de Síntomas , Adulto , Anciano , Anciano de 80 o más Años , Carcinoma de Pulmón de Células no Pequeñas , Femenino , Encuestas Epidemiológicas , Humanos , Estimación de Kaplan-Meier , Neoplasias Pulmonares/terapia , Masculino , Persona de Mediana Edad , Pronóstico , Modelos de Riesgos Proporcionales , Estudios Retrospectivos , Autoinforme , Índice de Severidad de la Enfermedad , Perfil de Impacto de Enfermedad , Encuestas y CuestionariosRESUMEN
BACKGROUND: Trials of combination therapies for the treatment of cancer are playing an increasingly important role in the battle against this disease. To more efficiently handle the large number of combination therapies that must be tested, we propose a novel Bayesian phase II adaptive screening design to simultaneously select among possible treatment combinations involving multiple agents. METHODS: Our design is based on formulating the selection procedure as a Bayesian hypothesis testing problem in which the superiority of each treatment combination is equated to a single hypothesis. During the trial conduct, we use the current values of the posterior probabilities of all hypotheses to adaptively allocate patients to treatment combinations. RESULTS: Simulation studies show that the proposed design substantially outperforms the conventional multiarm balanced factorial trial design. The proposed design yields a significantly higher probability for selecting the best treatment while allocating substantially more patients to efficacious treatments. LIMITATIONS: The proposed design is most appropriate for the trials combining multiple agents and screening out the efficacious combination to be further investigated. CONCLUSIONS: The proposed Bayesian adaptive phase II screening design substantially outperformed the conventional complete factorial design. Our design allocates more patients to better treatments while providing higher power to identify the best treatment at the end of the trial.
Asunto(s)
Teorema de Bayes , Evaluación Preclínica de Medicamentos/métodos , Quimioterapia Combinada/métodos , Neoplasias/tratamiento farmacológico , Ensayos Clínicos Controlados Aleatorios como Asunto/métodos , Proyectos de Investigación , Área Bajo la Curva , Humanos , Modelos Estadísticos , Sensibilidad y Especificidad , Resultado del TratamientoRESUMEN
This article proposes methodology for assessing goodness of fit in Bayesian hierarchical models. The methodology is based on comparing values of pivotal discrepancy measures (PDMs), computed using parameter values drawn from the posterior distribution, to known reference distributions. Because the resulting diagnostics can be calculated from standard output of Markov chain Monte Carlo algorithms, their computational costs are minimal. Several simulation studies are provided, each of which suggests that diagnostics based on PDMs have higher statistical power than comparable posterior-predictive diagnostic checks in detecting model departures. The proposed methodology is illustrated in a clinical application; an application to discrete data is described in supplementary material.
Asunto(s)
Teorema de Bayes , Biometría/métodos , Interpretación Estadística de Datos , Modelos Estadísticos , Simulación por Computador , Cadenas de MarkovRESUMEN
OBJECTIVE: The purpose of this article is to determine how often unexpected (18)F-FDG PET/CT findings result in a change in management for patients with stage IV and clinically evident stage III melanoma with resectable disease according to conventional imaging. SUBJECTS AND METHODS: Thirty-two patients with oligometastatic stage IV and clinically evident stage III melanoma were identified by surgical oncologists according to the results of conventional imaging, which included contrast-enhanced CT of the chest, abdomen, and pelvis and MRI of the brain. The surgical plan included resection of known metastases or isolated limb perfusion with chemotherapy. Thirty-three FDG PET/CT scans were performed within 36 days of their contrast-enhanced CT. The impact of PET/CT was defined as the percentage of cases in which a change in the surgical plan resulted from the unanticipated PET/CT findings. RESULTS: PET/CT revealed unexpected melanoma metastases in 12% of scans (4/33). As a result, the surgery was canceled for two patients, and the planned approach was altered for another two patients to address the unexpected sites. In 6% of scans (2/33), the unexpected metastases were detected in the extremities, which were not included in conventional imaging. Three scans (9%) showed false-positive FDG-avid findings that proved to be benign by subsequent stability or resolution with no therapy. CONCLUSION: In patients with surgically treatable metastatic melanoma, FDG PET/CT can detect unexpected metastases that are missed or not imaged with conventional imaging, and can be considered as part of preoperative workup.
Asunto(s)
Melanoma/diagnóstico por imagen , Melanoma/terapia , Imagen Multimodal , Metástasis de la Neoplasia/diagnóstico por imagen , Tomografía de Emisión de Positrones , Adulto , Anciano , Anciano de 80 o más Años , Medios de Contraste , Femenino , Humanos , Imagen por Resonancia Magnética , Masculino , Melanoma/patología , Persona de Mediana Edad , Metástasis de la Neoplasia/patología , Metástasis de la Neoplasia/terapia , Estadificación de Neoplasias , Estudios Prospectivos , Tomografía Computarizada por Rayos XRESUMEN
Bayesian hypothesis testing procedures have gained increased acceptance in recent years. A key advantage that Bayesian tests have over classical testing procedures is their potential to quantify information in support of true null hypotheses. Ironically, default implementations of Bayesian tests prevent the accumulation of strong evidence in favor of true null hypotheses because associated default alternative hypotheses assign a high probability to data that are most consistent with a null effect. We propose the use of "nonlocal" alternative hypotheses to resolve this paradox. The resulting class of Bayesian hypothesis tests permits more rapid accumulation of evidence in favor of both true null hypotheses and alternative hypotheses that are compatible with standardized effect sizes of most interest in psychology. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
RESUMEN
Clustering is a challenging problem in machine learning in which one attempts to group N objects into K0 groups based on P features measured on each object. In this article, we examine the case where N ⪠P and K0 is not known. Clustering in such high dimensional, small sample size settings has numerous applications in biology, medicine, the social sciences, clinical trials, and other scientific and experimental fields. Whereas most existing clustering algorithms either require the number of clusters to be known a priori or are sensitive to the choice of tuning parameters, our method does not require the prior specification of K0 or any tuning parameters. This represents an important advantage for our method because training data are not available in the applications we consider (i.e., in unsupervised learning problems). Without training data, estimating K0 and other hyperparameters-and thus applying alternative clustering algorithms-can be difficult and lead to inaccurate results. Our method is based on a simple transformation of the Gram matrix and application of the strong law of large numbers to the transformed matrix. If the correlation between features decays as the number of features grows, we show that the transformed feature vectors concentrate tightly around their respective cluster expectations in a low-dimensional space. This result simplifies the detection and visualization of the unknown cluster configuration. We illustrate the algorithm by applying it to 32 benchmarked microarray datasets, each containing thousands of genomic features measured on a relatively small number of tissue samples. Compared to 21 other commonly used clustering methods, we find that the proposed algorithm is faster and twice as accurate in determining the "best" cluster configuration.
RESUMEN
PURPOSE: To investigate the effects of increasing doses of angiotensin II on hepatic hemodynamics in the normal rabbit liver and in hepatic VX2 tumors by using dynamic contrast material-enhanced perfusion computed tomography (CT). MATERIALS AND METHODS: This study was approved by the institutional animal care and use committee. Solitary hepatic VX2 tumors were implanted into 12 rabbits. In each animal, perfusion CT of the liver was performed before (at baseline) and after hepatic arterial infusion of varying doses (0.1-50.0 µg/mL) of angiotensin II. Images were acquired continuously for 80 seconds after the start of the intravenous contrast material administration. Blood flow (BF), blood volume (BV), mean transit time (MTT), and capillary permeability-surface area product were calculated for the tumor and the adjacent and distant normal liver tissue. Generalized linear mixed models were used to estimate the effects of angiotensin II dose on outcome measures. RESULTS: Angiotensin II infusion increased contrast enhancement of the tumor and distal liver vessels. Tumor BF increased in a dose-dependent manner after administration of 0.5-25.0 µg/mL angiotensin II, but only the 2.5 µg/mL dose induced a significant increase in tumor BF compared with BF in the adjacent (68.0 vs 26.3 mL/min/100 g, P < .0001) and distant (68.0 vs 28.3 mL/min/100 g, P = .02) normal liver tissue. Tumor BV varied with angiotensin II dose but was greater than the BV of the adjacent and distant liver tissue at only the 2.5 µg/mL (4.8 vs 3.5 mL/100 g for adjacent liver [P < .0001], 4.8 vs 3.3 mL/100 g for distant liver [P = .0006]) and 10.0 µg/mL (4.9 vs 4.4 mL/100 g for adjacent liver [P = .007], 4.9 vs 4.3 mL/100 g for distant liver [P = .04]) doses. Tumor MTT was significantly shorter than the adjacent liver tissue MTT at angiotensin II doses of 2.5 µg/mL (9.7 vs 15.8 sec, P = .001) and 10.0 µg/mL (5.1 vs 13.2 sec, P = .007) and significantly shorter than the distant liver tissue MTT at 2.5 µg/mL only (9.7 vs 15.3 sec, P = .0006). The capillary permeability-surface area product for the tumor was higher than that for the adjacent liver tissue at the 2.5 µg/mL angiotensin II dose only (11.5 vs 8.1 mL/min/100 g, P = .01). CONCLUSION: Perfusion CT enables a mechanistic understanding of angiotensin II infusion in the liver and derivation of the optimal effective dose. The 2.5 µg/mL angiotensin II dose increases perfusion in hepatic VX2 tumors versus that in adjacent and distant normal liver tissue primarily by constricting normal distal liver vessels and in turn increasing tumor BF and BV.