Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
JAMA Cardiol ; 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38691380

RESUMEN

Importance: Built environment plays an important role in development of cardiovascular disease. Large scale, pragmatic evaluation of built environment has been limited owing to scarce data and inconsistent data quality. Objective: To investigate the association between image-based built environment and the prevalence of cardiometabolic disease in urban cities. Design, Setting, and Participants: This cross-sectional study used features extracted from Google satellite images (GSI) to measure the built environment and link them with prevalence of cardiometabolic disease. Convolutional neural networks, light gradient-boosting machines, and activation maps were used to assess the association with health outcomes and identify feature associations with coronary heart disease (CHD), stroke, and chronic kidney disease (CKD). The study obtained aerial images from GSI covering census tracts in 7 cities (Cleveland, Ohio; Fremont, California; Kansas City, Missouri; Detroit, Michigan; Bellevue, Washington; Brownsville, Texas; and Denver, Colorado). The study used census tract-level data from the US Centers for Disease Control and Prevention's 500 Cities project. The data were originally collected from the Behavioral Risk Factor Surveillance System that surveyed people 18 years and older across the country. Analyses were conducted from February to December 2022. Exposures: GSI images of built environment and cardiometabolic disease prevalence. Main Outcomes and Measures: Census tract-level estimated prevalence of CHD, stroke, and CKD based on image-based built environment features. Results: The study obtained 31 786 aerial images from GSI covering 789 census tracts. Built environment features extracted from GSI using machine learning were associated with prevalence of CHD (R2 = 0.60), stroke (R2 = 0.65), and CKD (R2 = 0.64). The model performed better at distinguishing differences between cardiometabolic prevalence between cities than within cities (eg, highest within-city R2 = 0.39 vs between-city R2 = 0.64 for CKD). Addition of GSI features both outperformed and improved the model that only included age, sex, race, income, education, and composite indices for social determinants of health (R2 = 0.83 vs R2 = 0.76 for CHD; P <.001). Activation maps from the features revealed certain health-related built environment such as roads, highways, and railroads and recreational facilities such as amusement parks, arenas, and baseball parks. Conclusions and Relevance: In this cross-sectional study, a significant portion of cardiometabolic disease prevalence was associated with GSI-based built environment using convolutional neural networks.

2.
Circulation ; 149(16): 1298-1314, 2024 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-38620080

RESUMEN

Urban environments contribute substantially to the rising burden of cardiometabolic diseases worldwide. Cities are complex adaptive systems that continually exchange resources, shaping exposures relevant to human health such as air pollution, noise, and chemical exposures. In addition, urban infrastructure and provisioning systems influence multiple domains of health risk, including behaviors, psychological stress, pollution, and nutrition through various pathways (eg, physical inactivity, air pollution, noise, heat stress, food systems, the availability of green space, and contaminant exposures). Beyond cardiometabolic health, city design may also affect climate change through energy and material consumption that share many of the same drivers with cardiometabolic diseases. Integrated spatial planning focusing on developing sustainable compact cities could simultaneously create heart-healthy and environmentally healthy city designs. This article reviews current evidence on the associations between the urban exposome (totality of exposures a person experiences, including environmental, occupational, lifestyle, social, and psychological factors) and cardiometabolic diseases within a systems science framework, and examines urban planning principles (eg, connectivity, density, diversity of land use, destination accessibility, and distance to transit). We highlight critical knowledge gaps regarding built-environment feature thresholds for optimizing cardiometabolic health outcomes. Last, we discuss emerging models and metrics to align urban development with the dual goals of mitigating cardiometabolic diseases while reducing climate change through cross-sector collaboration, governance, and community engagement. This review demonstrates that cities represent crucial settings for implementing policies and interventions to simultaneously tackle the global epidemics of cardiovascular disease and climate change.


Asunto(s)
Contaminación del Aire , Salud Urbana , Humanos , Ciudades/epidemiología , Contaminación del Aire/efectos adversos
3.
Eur Heart J ; 45(17): 1540-1549, 2024 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-38544295

RESUMEN

BACKGROUND AND AIMS: Built environment plays an important role in the development of cardiovascular disease. Tools to evaluate the built environment using machine vision and informatic approaches have been limited. This study aimed to investigate the association between machine vision-based built environment and prevalence of cardiometabolic disease in US cities. METHODS: This cross-sectional study used features extracted from Google Street View (GSV) images to measure the built environment and link them with prevalence of coronary heart disease (CHD). Convolutional neural networks, linear mixed-effects models, and activation maps were utilized to predict health outcomes and identify feature associations with CHD at the census tract level. The study obtained 0.53 million GSV images covering 789 census tracts in seven US cities (Cleveland, OH; Fremont, CA; Kansas City, MO; Detroit, MI; Bellevue, WA; Brownsville, TX; and Denver, CO). RESULTS: Built environment features extracted from GSV using deep learning predicted 63% of the census tract variation in CHD prevalence. The addition of GSV features improved a model that only included census tract-level age, sex, race, income, and education or composite indices of social determinant of health. Activation maps from the features revealed a set of neighbourhood features represented by buildings and roads associated with CHD prevalence. CONCLUSIONS: In this cross-sectional study, the prevalence of CHD was associated with built environment factors derived from GSV through deep learning analysis, independent of census tract demographics. Machine vision-enabled assessment of the built environment could potentially offer a more precise approach to identify at-risk neighbourhoods, thereby providing an efficient avenue to address and reduce cardiovascular health disparities in urban environments.


Asunto(s)
Inteligencia Artificial , Entorno Construido , Enfermedad de la Arteria Coronaria , Humanos , Estudios Transversales , Enfermedad de la Arteria Coronaria/epidemiología , Prevalencia , Masculino , Femenino , Estados Unidos/epidemiología , Persona de Mediana Edad , Ciudades/epidemiología
5.
Am J Prev Cardiol ; 17: 100630, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38223296

RESUMEN

Background: The care for patients with type 2 diabetes mellitus (T2DM) necessitates a multidisciplinary team approach to reduce cardiovascular (CV) risk but implementation of effective integrated strategies has been limited. Methods and Results: We report 2-year results from a patient-centered, team-based intervention called CINEMA at University Hospitals Cleveland Medical Center. Patients with T2DM or prediabetes at high-risk for CV events, including those with established atherosclerotic CVD, elevated coronary artery calcium score ≥100, chronic heart failure with reduced ejection fraction, chronic kidney disease (CKD) stages 2-4, and/or prevalent metabolic syndrome were included. From May 2020 through September 2022, 426 patients were enrolled in the CINEMA program. A total of 227 (54%) completed ≥1 follow-up visit after an initial baseline visit with median (IQR) follow-up time 4 [3], [4], [5], [6], [7] months with maximum follow-up time 19 months. Mean age was 60 years, 47 % were women, and 37 % were Black and 85% had prevalent T2DM, 48 % had established ASCVD, 29% had chronic HF, 27% had CKD and mean baseline 10-year ASCVD risk estimate was 25.1 %; baseline use of a SGLT2i or GLP-1RA was 21 % and 18 %, respectively. Patients had significant reductions from baseline in body weight (-5.5 lbs), body mass index (-0.9 kg/m2), systolic (-3.6 mmHg) and diastolic (-1.2 mmHg) blood pressure, Hb A1c (-0.5 %), total (-10.7 mg/dL) and low-density lipoprotein (-9.0 mg/dL) cholesterol, and triglycerides (-13.5 mg/dL) (p<0.05 for all). Absolute 10-year predicted ASCVD risk decreased by ∼2.4 % (p<0.001) with the intervention. In addition, rates of guideline-directed cardiometabolic medication prescriptions significantly increased during follow-up with the most substantive changes seen in rates of SGLT2i and GLP-1RA use which approximately tripled from baseline (21 % to 57 % for SGLT2i and 18 % to 65 % for GLP-1RA, p<0.001 for both). Conclusions: The CINEMA program, an integrated, patient-centered, team-based intervention for patients with T2DM or prediabetes at high risk for cardiovascular disease has continued to demonstrate effectiveness with significant improvements in ASCVD risk factors and improved use of evidence-based therapies. Successful implementation and dissemination of this care delivery paradigm remains a key priority.

6.
medRxiv ; 2023 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-37034698

RESUMEN

Background: Built environment plays an important role in development of cardiovascular disease. Tools to evaluate the built environment using machine vision and informatic approaches has been limited. We sought to investigate the association between machine vision-based built environment and prevalence of cardiometabolic disease in urban cities. Methods: This cross-sectional study used features extracted from Google Street view (GSV) images to measure the built environment and link them with prevalence of cardiometabolic disease. Convolutional neural networks, light gradient boosting machines and activation maps were utilized to predict health outcomes and identify feature associations with coronary heart disease (CHD). The study obtained 0.53 million GSV images covering 789 census tracts in 7 cities (Cleveland, OH; Fremont, CA; Kansas City, MO; Detroit, MI; Bellevue, WA; Brownsville, TX; and Denver, CO). Analyses were conducted from February 2022 to December 2022. We used census tract-level data from the Centers for Disease Control and Prevention's PLACES dataset. Main outcomes included census tract-level estimated prevalence of CHD based on GSV built environment features. Results: Built environment features extracted from GSV using deep learning predicted 63% of the census tract variation in CHD prevalence. The ExtraTrees Regressor achieved the best result among all models with the lowest average mean absolute error of 1.11% and Root mean square of error of 1.58. The addition of GSV features outperformed and improved a model that only included census-tract level age, sex, race, income and education. Activation maps from the features revealed a set of neighborhood features represented by buildings and roads associated with CHD prevalence. Conclusions: In this cross-sectional study, a significant portion of CHD prevalence were explained by GSV-based built environment factors analyzed using deep learning, independent of census tract demographics. Machine vision enabled assessment of the built environment could help play a significant role in designing and improving heart-heathy cities.

7.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4637-4649, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-35914037

RESUMEN

Principal components analysis has been used to reduce the dimensionality of datasets for a long time. In this paper, we will demonstrate that in mode detection the components of smallest variance, the pettiest components, are more important. We prove that for a multivariate normal or Laplace distribution, we obtain boxes of optimal volume by implementing "pettiest component analysis," in the sense that their volume is minimal over all possible boxes with the same number of dimensions and fixed probability. This reduction in volume produces an information gain that is measured using active information. We illustrate our results with a simulation and a search for modal patterns of digitized images of hand-written numbers using the famous MNIST database; in both cases pettiest components work better than their competitors. In fact, we show that modes obtained with pettiest components generate better written digits for MNIST than principal components.

8.
BMC Bioinformatics ; 22(1): 22, 2021 Jan 12.
Artículo en Inglés | MEDLINE | ID: mdl-33435872

RESUMEN

BACKGROUND: In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets. RESULTS: We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting . CONCLUSIONS: dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.


Asunto(s)
Biología Computacional , Evaluación Preclínica de Medicamentos/métodos , Transcriptoma/efectos de los fármacos , Linfocitos T CD4-Positivos/efectos de los fármacos , Linfocitos T CD4-Positivos/metabolismo , Ciclo Celular/efectos de los fármacos , Ciclo Celular/genética , Humanos , Fenotipo , Probabilidad
9.
J Immunol ; 203(8): 2194-2209, 2019 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-31541022

RESUMEN

Tuberculosis (TB) remains a worldwide public health threat. Development of a more effective vaccination strategy to prevent pulmonary TB, the most common and contagious form of the disease, is a research priority for international TB control. A key to reaching this goal is improved understanding of the mechanisms of local immunity to Mycobacterium tuberculosis, the causative organism of TB. In this study, we evaluated global M. tuberculosis-induced gene expression in airway immune cells obtained by bronchoalveolar lavage (BAL) of individuals with latent TB infection (LTBI) and M. tuberculosis-naive controls. In prior studies, we demonstrated that BAL cells from LTBI individuals display substantial enrichment for M. tuberculosis-responsive CD4+ T cells compared with matched peripheral blood samples. We therefore specifically assessed the impact of the depletion of CD4+ and CD8+ T cells on M. tuberculosis-induced BAL cell gene expression in LTBI. Our studies identified 12 canonical pathways and a 47-gene signature that was both sensitive and specific for the contribution of CD4+ T cells to local recall responses to M. tuberculosis In contrast, depletion of CD8+ cells did not identify any genes that fit our strict criteria for inclusion in this signature. Although BAL CD4+ T cells in LTBI displayed polyfunctionality, the observed gene signature predominantly reflected the impact of IFN-γ production on a wide range of host immune responses. These findings provide a standard for comparison of the efficacy of standard bacillus Calmette-Guérin vaccination as well as novel TB vaccines now in development at impacting the initial response to re-exposure to M. tuberculosis in the human lung.


Asunto(s)
Lavado Broncoalveolar , Linfocitos T CD4-Positivos/inmunología , Interferón gamma/biosíntesis , Tuberculosis Latente/genética , Mycobacterium tuberculosis/inmunología , Adolescente , Adulto , Femenino , Humanos , Interferón gamma/inmunología , Tuberculosis Latente/inmunología , Masculino , Persona de Mediana Edad , Vacunas contra la Tuberculosis/inmunología , Adulto Joven
10.
Appl Stoch Models Bus Ind ; 35(2): 376-393, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-34135693

RESUMEN

We propose a new method to find modes based on active information. We develop an algorithm called active information mode hunting (AIMH) that, when applied to the whole space, will say whether there are any modes present and where they are. We show AIMH is consistent and, given that information increases where probability decreases, it helps to overcome issues with the curse of dimensionality. The AIMH also reduces the dimensionality with no resource to principal components. We illustrate the method in three ways: with a theoretical example (showing how it performs better than other mode hunting strategies), a real dataset business application, and a simulation.

11.
Stat Appl Genet Mol Biol ; 17(1)2018 02 17.
Artículo en Inglés | MEDLINE | ID: mdl-29453930

RESUMEN

Unraveling interactions among variables such as genetic, clinical, demographic and environmental factors is essential to understand the development of common and complex diseases. To increase the power to detect such variables interactions associated with clinical time-to-events outcomes, we borrowed established concepts from random survival forest (RSF) models. We introduce a novel RSF-based pairwise interaction estimator and derive a randomization method with bootstrap confidence intervals for inferring interaction significance. Using various linear and nonlinear time-to-events survival models in simulation studies, we first show the efficiency of our approach: true pairwise interaction-effects between variables are uncovered, while they may not be accompanied with their corresponding main-effects, and may not be detected by standard semi-parametric regression modeling and test statistics used in survival analysis. Moreover, using a RSF-based cross-validation scheme for generating prediction estimators, we show that informative predictors may be inferred. We applied our approach to an HIV cohort study recording key host gene polymorphisms and their association with HIV change of tropism or AIDS progression. Altogether, this shows how linear or nonlinear pairwise statistical interactions of variables may be efficiently detected with a predictive value in observational studies with time-to-event outcomes.


Asunto(s)
Infecciones por VIH/genética , Infecciones por VIH/mortalidad , Modelos Estadísticos , Síndrome de Inmunodeficiencia Adquirida/genética , Estudios de Cohortes , Intervalos de Confianza , Variaciones en el Número de Copia de ADN , Epistasis Genética , Infecciones por VIH/virología , VIH-1/patogenicidad , VIH-1/fisiología , Humanos , Estimación de Kaplan-Meier , Modelos Genéticos , Modelos de Riesgos Proporcionales , Tropismo Viral , beta-Defensinas/genética
12.
Mol Cell Proteomics ; 15(7): 2356-65, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27143410

RESUMEN

Glioblastoma multiforme (GBM) is a genomically complex and aggressive primary adult brain tumor, with a median survival time of 12-14 months. The heterogeneous nature of this disease has made the identification and validation of prognostic biomarkers difficult. Using reverse phase protein array data from 203 primary untreated GBM patients, we have identified a set of 13 proteins with prognostic significance. Our protein signature predictive of glioblastoma (PROTGLIO) patient survival model was constructed and validated on independent data sets and was shown to significantly predict survival in GBM patients (log-rank test: p = 0.0009). Using a multivariate Cox proportional hazards, we have shown that our PROTGLIO model is distinct from other known GBM prognostic factors (age at diagnosis, extent of surgical resection, postoperative Karnofsky performance score (KPS), treatment with temozolomide (TMZ) chemoradiation, and methylation of the MGMT gene). Tenfold cross-validation repetition of our model generation procedure confirmed validation of PROTGLIO. The model was further validated on an independent set of isocitrate dehydrogenase wild-type (IDHwt) lower grade gliomas (LGG)-a portion of these tumors progress rapidly to GBM. The PROTGLIO model contains proteins, such as Cox-2 and Annexin 1, involved in inflammatory response, pointing to potential therapeutic interventions. The PROTGLIO model is a simple and effective predictor of overall survival in glioblastoma patients, making it potentially useful in clinical practice of glioblastoma multiforme.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Neoplasias Encefálicas/tratamiento farmacológico , Dacarbazina/análogos & derivados , Glioblastoma/tratamiento farmacológico , Proteómica/métodos , Adulto , Anciano , Anciano de 80 o más Años , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/metabolismo , Dacarbazina/administración & dosificación , Dacarbazina/uso terapéutico , Femenino , Glioblastoma/genética , Glioblastoma/metabolismo , Humanos , Isocitrato Deshidrogenasa/genética , Masculino , Persona de Mediana Edad , Pronóstico , Modelos de Riesgos Proporcionales , Análisis de Supervivencia , Temozolomida , Adulto Joven
13.
Stat Anal Data Min ; 9(1): 12-42, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-27034730

RESUMEN

We introduce a framework to build a survival/risk bump hunting model with a censored time-to-event response. Our Survival Bump Hunting (SBH) method is based on a recursive peeling procedure that uses a specific survival peeling criterion derived from non/semi-parametric statistics such as the hazards-ratio, the log-rank test or the Nelson--Aalen estimator. To optimize the tuning parameter of the model and validate it, we introduce an objective function based on survival or prediction-error statistics, such as the log-rank test and the concordance error rate. We also describe two alternative cross-validation techniques adapted to the joint task of decision-rule making by recursive peeling and survival estimation. Numerical analyses show the importance of replicated cross-validation and the differences between criteria and techniques in both low and high-dimensional settings. Although several non-parametric survival models exist, none addresses the problem of directly identifying local extrema. We show how SBH efficiently estimates extreme survival/risk subgroups unlike other models. This provides an insight into the behavior of commonly used models and suggests alternatives to be adopted in practice. Finally, our SBH framework was applied to a clinical dataset. In it, we identified subsets of patients characterized by clinical and demographic covariates with a distinct extreme survival outcome, for which tailored medical interventions could be made. An R package PRIMsrc (Patient Rule Induction Method in Survival, Regression and Classification settings) is available on CRAN (Comprehensive R Archive Network) and GitHub.

14.
Proc Am Stat Assoc ; 2015: 650-664, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26798326

RESUMEN

PRIMsrc is a novel implementation of a non-parametric bump hunting procedure, based on the Patient Rule Induction Method (PRIM), offering a unified treatment of outcome variables, including censored time-to-event (Survival), continuous (Regression) and discrete (Classification) responses. To fit the model, it uses a recursive peeling procedure with specific peeling criteria and stopping rules depending on the response. To validate the model, it provides an objective function based on prediction-error or other specific statistic, as well as two alternative cross-validation techniques, adapted to the task of decision-rule making and estimation in the three types of settings. PRIMsrc comes as an open source R package, including at this point: (i) a main function for fitting a Survival Bump Hunting model with various options allowing cross-validated model selection to control model size (#covariates) and model complexity (#peeling steps) and generation of cross-validated end-point estimates; (ii) parallel computing; (iii) various S3-generic and specific plotting functions for data visualization, diagnostic, prediction, summary and display of results. It is available on CRAN and GitHub.

15.
BMC Syst Biol ; 8: 72, 2014 Jun 23.
Artículo en Inglés | MEDLINE | ID: mdl-24954394

RESUMEN

BACKGROUND: To determine how diets high in saturated fat could increase polyp formation in the mouse model of intestinal neoplasia, ApcMin/+, we conducted large-scale metabolome analysis and association study of colon and small intestine polyp formation from plasma and liver samples of ApcMin/+ vs. wild-type littermates, kept on low vs. high-fat diet. Label-free mass spectrometry was used to quantify untargeted plasma and acyl-CoA liver compounds, respectively. Differences in contrasts of interest were analyzed statistically by unsupervised and supervised modeling approaches, namely Principal Component Analysis and Linear Model of analysis of variance. Correlation between plasma metabolite concentrations and polyp numbers was analyzed with a zero-inflated Generalized Linear Model. RESULTS: Plasma metabolome in parallel to promotion of tumor development comprises a clearly distinct profile in ApcMin/+ mice vs. wild type littermates, which is further altered by high-fat diet. Further, functional metabolomics pathway and network analyses in ApcMin/+ mice on high-fat diet revealed associations between polyp formation and plasma metabolic compounds including those involved in amino-acids metabolism as well as nicotinamide and hippuric acid metabolic pathways. Finally, we also show changes in liver acyl-CoA profiles, which may result from a combination of ApcMin/+-mediated tumor progression and high fat diet. The biological significance of these findings is discussed in the context of intestinal cancer progression. CONCLUSIONS: These studies show that high-throughput metabolomics combined with appropriate statistical modeling and large scale functional approaches can be used to monitor and infer changes and interactions in the metabolome and genome of the host under controlled experimental conditions. Further these studies demonstrate the impact of diet on metabolic pathways and its relation to intestinal cancer progression. Based on our results, metabolic signatures and metabolic pathways of polyposis and intestinal carcinoma have been identified, which may serve as useful targets for the development of therapeutic interventions.


Asunto(s)
Proteína de la Poliposis Adenomatosa del Colon/genética , Predisposición Genética a la Enfermedad , Neoplasias Intestinales/genética , Neoplasias Intestinales/metabolismo , Metabolómica/métodos , Animales , Dieta Alta en Grasa/efectos adversos , Genotipo , Humanos , Neoplasias Intestinales/sangre , Pólipos Intestinales/sangre , Pólipos Intestinales/genética , Pólipos Intestinales/metabolismo , Hígado/efectos de los fármacos , Hígado/metabolismo , Masculino , Espectrometría de Masas , Ratones
16.
Proc Am Stat Assoc ; 2014: 3366-3380, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26997922

RESUMEN

We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called "Patient Recursive Survival Peeling" is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called "combined" cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication.

17.
J Proteome Res ; 11(9): 4476-87, 2012 Sep 07.
Artículo en Inglés | MEDLINE | ID: mdl-22845868

RESUMEN

Large-scale protein-protein interaction data sets have been generated for several species including yeast and human and have enabled the identification, quantification, and prediction of cellular molecular networks. Affinity purification-mass spectrometry (AP-MS) is the preeminent methodology for large-scale analysis of protein complexes, performed by immunopurifying a specific "bait" protein and its associated "prey" proteins. The analysis and interpretation of AP-MS data sets is, however, not straightforward. In addition, although yeast AP-MS data sets are relatively comprehensive, current human AP-MS data sets only sparsely cover the human interactome. Here we develop a framework for analysis of AP-MS data sets that addresses the issues of noise, missing data, and sparsity of coverage in the context of a current, real world human AP-MS data set. Our goal is to extend and increase the density of the known human interactome by integrating bait-prey and cocomplexed preys (prey-prey associations) into networks. Our framework incorporates a score for each identified protein, as well as elements of signal processing to improve the confidence of identified protein-protein interactions. We identify many protein networks enriched in known biological processes and functions. In addition, we show that integrated bait-prey and prey-prey interactions can be used to refine network topology and extend known protein networks.


Asunto(s)
Mapeo de Interacción de Proteínas/métodos , Mapas de Interacción de Proteínas/fisiología , Proteínas/química , Proteínas/metabolismo , Proteómica/métodos , Cromatografía de Afinidad , Análisis por Conglomerados , Bases de Datos de Proteínas , Humanos , Espectrometría de Masas , Modelos Biológicos , Estadísticas no Paramétricas
18.
BMC Bioinformatics ; 13: 128, 2012 Jun 08.
Artículo en Inglés | MEDLINE | ID: mdl-22682516

RESUMEN

BACKGROUND: Affinity-Purification Mass-Spectrometry (AP-MS) provides a powerful means of identifying protein complexes and interactions. Several important challenges exist in interpreting the results of AP-MS experiments. First, the reproducibility of AP-MS experimental replicates can be low, due both to technical variability and the dynamic nature of protein interactions in the cell. Second, the identification of true protein-protein interactions in AP-MS experiments is subject to inaccuracy due to high false negative and false positive rates. Several experimental approaches can be used to mitigate these drawbacks, including the use of replicated and control experiments and relative quantification to sensitively distinguish true interacting proteins from false ones. METHODS: To address the issues of reproducibility and accuracy of protein-protein interactions, we introduce a two-step method, called ROCS, which makes use of Indicator Prey Proteins to select reproducible AP-MS experiments, and of Confidence Scores to select specific protein-protein interactions. The Indicator Prey Proteins account for measures of protein identifiability as well as protein reproducibility, effectively allowing removal of outlier experiments that contribute noise and affect downstream inferences. The filtered set of experiments is then used in the Protein-Protein Interaction (PPI) scoring step. Prey protein scoring is done by computing a Confidence Score, which accounts for the probability of occurrence of prey proteins in the bait experiments relative to the control experiment, where the significance cutoff parameter is estimated by simultaneously controlling false positives and false negatives against metrics of false discovery rate and biological coherence respectively. In summary, the ROCS method relies on automatic objective criterions for parameter estimation and error-controlled procedures. RESULTS: We illustrate the performance of our method by applying it to five previously published AP-MS experiments, each containing well characterized protein interactions, allowing for systematic benchmarking of ROCS. We show that our method may be used on its own to make accurate identification of specific, biologically relevant protein-protein interactions, or in combination with other AP-MS scoring methods to significantly improve inferences. CONCLUSIONS: Our method addresses important issues encountered in AP-MS datasets, making ROCS a very promising tool for this purpose, either on its own or in conjunction with other methods. We anticipate that our methodology may be used more generally in proteomics studies and databases, where experimental reproducibility issues arise. The method is implemented in the R language, and is available as an R package called "ROCS", freely available from the CRAN repository http://cran.r-project.org/.


Asunto(s)
Cromatografía de Afinidad/estadística & datos numéricos , Espectrometría de Masas/estadística & datos numéricos , Mapeo de Interacción de Proteínas/estadística & datos numéricos , Proteínas/aislamiento & purificación , Proteómica/estadística & datos numéricos , Cromatografía de Afinidad/métodos , Intervalos de Confianza , Espectrometría de Masas/métodos , Probabilidad , Mapeo de Interacción de Proteínas/métodos , Proteómica/métodos , Reproducibilidad de los Resultados
19.
Comput Stat Data Anal ; 56(7): 2317-2333, 2012 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-22711950

RESUMEN

The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.

20.
Mol Cell Proteomics ; 11(6): M111.015479, 2012 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-22337588

RESUMEN

Allogeneic hematopoietic stem cell transplantation (SCT) is the only curative therapy for many malignant and nonmalignant conditions. Idiopathic pneumonia syndrome (IPS) is a frequently fatal complication that limits successful outcomes. Preclinical models suggest that IPS represents an immune mediated attack on the lung involving elements of both the adaptive and the innate immune system. However, the etiology of IPS in humans is less well understood. To explore the disease pathway and uncover potential biomarkers of disease, we performed two separate label-free, proteomics experiments defining the plasma protein profiles of allogeneic SCT patients with IPS. Samples obtained from SCT recipients without complications served as controls. The initial discovery study, intended to explore the disease pathway in humans, identified a set of 81 IPS-associated proteins. These data revealed similarities between the known IPS pathways in mice and the condition in humans, in particular in the acute phase response. In addition, pattern recognition pathways were judged to be significant as a function of development of IPS, and from this pathway we chose the lipopolysaccaharide-binding protein (LBP) protein as a candidate molecular diagnostic for IPS, and verified its increase as a function of disease using an ELISA assay. In a separately designed study, we identified protein-based classifiers that could predict, at day 0 of SCT, patients who: 1) progress to IPS and 2) respond to cytokine neutralization therapy. Using cross-validation strategies, we built highly predictive classifier models of both disease progression and therapeutic response. In sum, data generated in this report confirm previous clinical and experimental findings, provide new insights into the pathophysiology of IPS, identify potential molecular classifiers of the condition, and uncover a set of markers potentially of interest for patient stratification as a basis for individualized therapy.


Asunto(s)
Proteínas Sanguíneas/metabolismo , Trasplante de Células Madre Hematopoyéticas/efectos adversos , Modelos Biológicos , Neumonía/sangre , Proteínas de Fase Aguda/aislamiento & purificación , Proteínas de Fase Aguda/metabolismo , Antiinflamatorios no Esteroideos/uso terapéutico , Biomarcadores/sangre , Proteínas Sanguíneas/aislamiento & purificación , Electrocromatografía Capilar , Estudios de Casos y Controles , Progresión de la Enfermedad , Etanercept , Humanos , Inmunoglobulina G/uso terapéutico , Neumonía/tratamiento farmacológico , Neumonía/etiología , Neumonía/patología , Análisis de Componente Principal , Proteómica , Receptores del Factor de Necrosis Tumoral/uso terapéutico , Reproducibilidad de los Resultados , Trasplante Homólogo/efectos adversos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA