RESUMEN
Trauma is the number one cause of death among Americans between the ages of 1 and 46, costing >$670 billion a year. Blunt and penetrating trauma can lead to cardiac and aortic injuries, with the incidence of death varying upon the location of the damage. Among those who reach the hospital alive, many may survive if the hemorrhage and cardiovascular injuries are diagnosed and treated adequately in a timely fashion. Although echocardiography often is underused in the setting of cardiac trauma, it offers significant diagnosis and treatment potential because it is accessible in most settings, safe, relatively noninvasive, and can provide rapid and accurate trauma assessment in the hands of trained providers. This review article aims to analyze the pathophysiology of cardiac injuries in patients with trauma and the role of echocardiography for the accurate diagnosis of cardiac injury in trauma. This review, additionally, will offer a patient-centered, team-based, early management plan with a treatment algorithm to help improve the quality of care among these patients with cardiac trauma.
Asunto(s)
Lesiones Cardíacas , Heridas no Penetrantes , Heridas Penetrantes , Adolescente , Adulto , Niño , Preescolar , Ecocardiografía , Lesiones Cardíacas/diagnóstico por imagen , Lesiones Cardíacas/terapia , Humanos , Lactante , Persona de Mediana Edad , Heridas no Penetrantes/complicaciones , Heridas Penetrantes/complicaciones , Heridas Penetrantes/diagnóstico , Adulto JovenRESUMEN
For survival endpoints in subgroup selection, a score conversion model is often used to convert the set of biomarkers for each patient into a univariate score and using the median of the univariate scores to divide the patients into biomarker-positive and biomarker-negative subgroups. However, this may lead to bias in patient subgroup identification regarding the 2 issues: (1) treatment is equally effective for all patients and/or there is no subgroup difference; (2) the median value of the univariate scores as a cutoff may be inappropriate if the sizes of the 2 subgroups are differ substantially. We utilize a univariate composite score method to convert the set of patient's candidate biomarkers to a univariate response score. We propose applying the likelihood ratio test (LRT) to assess homogeneity of the sampled patients to address the first issue. In the context of identification of the subgroup of responders in adaptive design to demonstrate improvement of treatment efficacy (adaptive power), we suggest that subgroup selection is carried out if the LRT is significant. For the second issue, we utilize a likelihood-based change-point algorithm to find an optimal cutoff. Our simulation study shows that type I error generally is controlled, while the overall adaptive power to detect treatment effects sacrifices approximately 4.5% for the simulation designs considered by performing the LRT; furthermore, the change-point algorithm outperforms the median cutoff considerably when the subgroup sizes differ substantially.
Asunto(s)
Selección de Paciente , Medicina de Precisión/mortalidad , Medicina de Precisión/métodos , Bases de Datos Factuales/tendencias , Humanos , Funciones de Verosimilitud , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/mortalidad , Neoplasias Pulmonares/terapia , Medicina de Precisión/tendencias , Tasa de Supervivencia/tendencias , Resultado del TratamientoRESUMEN
BACKGROUND: Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. METHODS: We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure. RESULTS: The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. CONCLUSION: The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.
Asunto(s)
Algoritmos , Minería de Datos/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Biomarcadores/análisis , Análisis por Conglomerados , Modelos Teóricos , Polimorfismo de Nucleótido Simple/genética , Salmonella/clasificación , Salmonella/genética , SerotipificaciónRESUMEN
BACKGROUND: Both adolescent substance use and adolescent depression are major public health problems, and have the tendency to co-occur. Thousands of articles on adolescent substance use or depression have been published. It is labor intensive and time consuming to extract huge amounts of information from the cumulated collections. Topic modeling offers a computational tool to find relevant topics by capturing meaningful structure among collections of documents. METHODS: In this study, a total of 17,723 abstracts from PubMed published from 2000 to 2014 on adolescent substance use and depression were downloaded as objects, and Latent Dirichlet allocation (LDA) was applied to perform text mining on the dataset. Word clouds were used to visually display the content of topics and demonstrate the distribution of vocabularies over each topic. RESULTS: The LDA topics recaptured the search keywords in PubMed, and further discovered relevant issues, such as intervention program, association links between adolescent substance use and adolescent depression, such as sexual experience and violence, and risk factors of adolescent substance use, such as family factors and peer networks. Using trend analysis to explore the dynamics of proportion of topics, we found that brain research was assessed as a hot issue by the coefficient of the trend test. CONCLUSIONS: Topic modeling has the ability to segregate a large collection of articles into distinct themes, and it could be used as a tool to understand the literature, not only by recapturing known facts but also by discovering other relevant topics.
Asunto(s)
Minería de Datos/métodos , Depresión/epidemiología , Trastornos Relacionados con Sustancias/epidemiología , Adolescente , Conducta del Adolescente , HumanosRESUMEN
Recently, personalized medicine has received great attention to improve safety and effectiveness in drug development. Personalized medicine aims to provide medical treatment that is tailored to the patient's characteristics such as genomic biomarkers, disease history, etc., so that the benefit of treatment can be optimized. Subpopulations identification is to divide patients into several different subgroups where each subgroup corresponds to an optimal treatment. For two subgroups, traditionally the multivariate Cox proportional hazards model is fitted and used to calculate the risk score when outcome is survival time endpoint. Median is commonly chosen as the cutoff value to separate patients. However, using median as the cutoff value is quite subjective and sometimes may be inappropriate in situations where data are imbalanced. Here, we propose a novel tree-based method that adopts the algorithm of relative risk trees to identify subgroup patients. After growing a relative risk tree, we apply k-means clustering to group the terminal nodes based on the averaged covariates. We adopt an ensemble Bagging method to improve the performance of a single tree since it is well known that the performance of a single tree is quite unstable. A simulation study is conducted to compare the performance between our proposed method and the multivariate Cox model. The applications of our proposed method to two public cancer data sets are also conducted for illustration.
Asunto(s)
Algoritmos , Modelos Biológicos , Medicina de Precisión/métodos , Simulación por Computador , Humanos , Neoplasias/terapia , RiesgoRESUMEN
BACKGROUND: Topic modelling is an active research field in machine learning. While mainly used to build models from unstructured textual data, it offers an effective means of data mining where samples represent documents, and different biological endpoints or omics data represent words. Latent Dirichlet Allocation (LDA) is the most commonly used topic modelling method across a wide number of technical fields. However, model development can be arduous and tedious, and requires burdensome and systematic sensitivity studies in order to find the best set of model parameters. Often, time-consuming subjective evaluations are needed to compare models. Currently, research has yielded no easy way to choose the proper number of topics in a model beyond a major iterative approach. METHODS AND RESULTS: Based on analysis of variation of statistical perplexity during topic modelling, a heuristic approach is proposed in this study to estimate the most appropriate number of topics. Specifically, the rate of perplexity change (RPC) as a function of numbers of topics is proposed as a suitable selector. We test the stability and effectiveness of the proposed method for three markedly different types of grounded-truth datasets: Salmonella next generation sequencing, pharmacological side effects, and textual abstracts on computational biology and bioinformatics (TCBB) from PubMed. CONCLUSION: The proposed RPC-based method is demonstrated to choose the best number of topics in three numerical experiments of widely different data types, and for databases of very different sizes. The work required was markedly less arduous than if full systematic sensitivity studies had been carried out with number of topics as a parameter. We understand that additional investigation is needed to substantiate the method's theoretical basis, and to establish its generalizability in terms of dataset characteristics.
Asunto(s)
Biología Computacional/métodos , Minería de Datos/métodos , Heurística/fisiología , Bases de Datos Factuales , Secuenciación de Nucleótidos de Alto RendimientoRESUMEN
A class-imbalanced classifier is a decision rule to predict the class membership of new samples from an available data set where the class sizes differ considerably. When the class sizes are very different, most standard classification algorithms may favor the larger (majority) class resulting in poor accuracy in the minority class prediction. A class-imbalanced classifier typically modifies a standard classifier by a correction strategy or by incorporating a new strategy in the training phase to account for differential class sizes. This article reviews and evaluates some most important methods for class prediction of high-dimensional imbalanced data. The evaluation addresses the fundamental issues of the class-imbalanced classification problem: imbalance ratio, small disjuncts and overlap complexity, lack of data and feature selection. Four class-imbalanced classifiers are considered. The four classifiers include three standard classification algorithms each coupled with an ensemble correction strategy and one support vector machines (SVM)-based correction classifier. The three algorithms are (i) diagonal linear discriminant analysis (DLDA), (ii) random forests (RFs) and (ii) SVMs. The SVM-based correction classifier is SVM threshold adjustment (SVM-THR). A Monte-Carlo simulation and five genomic data sets were used to illustrate the analysis and address the issues. The SVM-ensemble classifier appears to perform the best when the class imbalance is not too severe. The SVM-THR performs well if the imbalance is severe and predictors are highly correlated. The DLDA with a feature selection can perform well without using the ensemble correction.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas/clasificación , Algoritmos , Análisis Discriminante , Femenino , Humanos , Máquina de Vectores de SoporteRESUMEN
BACKGROUND: Advances in molecular technology have shifted new drug development toward targeted therapy for treatments expected to benefit subpopulations of patients. Adaptive signature design (ASD) has been proposed to identify the most suitable target patient subgroup to enhance efficacy of treatment effect. There are two essential aspects in the development of biomarker adaptive designs: 1) an accurate classifier to identify the most appropriate treatment for patients, and 2) statistical tests to detect treatment effect in the relevant population and subpopulations. We propose utilization of classification methods to identity patient subgroups and present a statistical testing strategy to detect treatment effects. METHODS: The diagonal linear discriminant analysis (DLDA) is used to identify targeted and non-targeted subgroups. For binary endpoints, DLDA is directly applied to classify patient into two subgroups; for continuous endpoints, a two-step procedure involving model fitting and determination of a cutoff-point is used for subgroup classification. The proposed strategy includes tests for treatment effect in all patients and in a marker-positive subgroup, with a possible follow-up estimation of treatment effect in the marker-negative subgroup. The proposed method is compared to the ASD classification method using simulated datasets and two publically available cancer datasets. RESULTS: The DLDA-based classifier performs well in terms of sensitivity, specificity, positive and negative predictive values, and accuracy in the simulation data and the two cancer datasets, with superior accuracy compared to the ASD method. The subgroup testing strategy is shown to be useful in detecting treatment effect in terms of power and control of study-wise error. CONCLUSION: Accuracy of a classifier is essential for adaptive designs. A poor classifier not only assigns patients to inappropriate treatments, but also reduces the power of the test, resulting in incorrect conclusions. The proposed procedure provides an effective approach for subgroup identification and subgroup analysis.
Asunto(s)
Adenocarcinoma/diagnóstico , Biomarcadores de Tumor/análisis , Neoplasias Pulmonares/diagnóstico , Medicina de Precisión/métodos , Proyectos de Investigación , Adenocarcinoma del Pulmón , Algoritmos , Simulación por Computador , Supervivencia sin Enfermedad , Determinación de Punto Final , Humanos , Modelos Estadísticos , Selección de PacienteRESUMEN
Drug-induced organ toxicity (DIOT) that leads to the removal of marketed drugs or termination of candidate drugs has been a leading concern for regulatory agencies and pharmaceutical companies. In safety studies, the genomic assays are conducted after the treatment so that drug-induced adverse effects can occur. Two types of biomarkers are observed: biomarkers of susceptibility and biomarkers of response. This paper presents a statistical model to distinguish two types of biomarkers and procedures to identify susceptible subpopulations. The biomarkers identified are used to develop classification model to identify susceptible subpopulation. Two methods to identify susceptibility biomarkers were evaluated in terms of predictive performance in subpopulation identification, including sensitivity, specificity, and accuracy. Method 1 considered the traditional linear model with a variable-by-treatment interaction term, and Method 2 considered fitting a single predictor variable model using only treatment data. Monte Carlo simulation studies were conducted to evaluate the performance of the two methods and impact of the subpopulation prevalence, probability of DIOT, and sample size on the predictive performance. Method 2 appeared to outperform Method 1, which was due to the lack of power for testing the interaction effect. Important statistical issues and challenges regarding identification of preclinical DIOT biomarkers were discussed. In summary, identification of predictive biomarkers for treatment determination highly depends on the subpopulation prevalence. When the proportion of susceptible subpopulation is 1% or less, a very large sample size is needed to ensure observing sufficient number of DIOT responses for biomarker and/or subpopulation identifications.
Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/genética , Regulación de la Expresión Génica/efectos de los fármacos , Marcadores Genéticos , Proyectos de Investigación/estadística & datos numéricos , Animales , Simulación por Computador , Interpretación Estadística de Datos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/clasificación , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/diagnóstico , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/epidemiología , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/estadística & datos numéricos , Humanos , Modelos Lineales , Modelos Logísticos , Modelos Estadísticos , Método de Montecarlo , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Prevalencia , Medición de Riesgo , Tamaño de la MuestraRESUMEN
BACKGROUND: The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. RESULTS: In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. CONCLUSION: Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.
Asunto(s)
Minería de Datos/métodos , Algoritmos , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/mortalidad , Análisis por Conglomerados , Electroforesis en Gel de Campo Pulsado , Femenino , Humanos , Neoplasias Pulmonares/clasificación , Modelos Estadísticos , Salmonella/clasificación , Salmonella/aislamiento & purificación , Análisis de SupervivenciaRESUMEN
BACKGROUND: Gene-based analysis has become popular in genomic research because of its appealing biological and statistical properties compared with those of a single-locus analysis. However, only a few, if any, studies have discussed a mapping of expression quantitative trait loci (eQTL) in a gene-based framework. Neither study has discussed ancestry-informative eQTL nor investigated their roles in pharmacogenetics by integrating single nucleotide polymorphism (SNP)-based eQTL (s-eQTL) and gene-based eQTL (g-eQTL). RESULTS: In this g-eQTL mapping study, the transcript expression levels of genes (transcript-level genes; T-genes) were correlated with the SNPs of genes (sequence-level genes; S-genes) by using a method of gene-based partial least squares (PLS). Ancestry-informative transcripts were identified using a rank-score-based multivariate association test, and ancestry-informative eQTL were identified using Fisher's exact test. Furthermore, key ancestry-predictive eQTL were selected in a flexible discriminant analysis. We analyzed SNPs and gene expression of 210 independent people of African-, Asian- and European-descent. We identified numerous cis- and trans-acting g-eQTL and s-eQTL for each population by using PLS. We observed ancestry information enriched in eQTL. Furthermore, we identified 2 ancestry-informative eQTL associated with adverse drug reactions and/or drug response. Rs1045642, located on MDR1, is an ancestry-informative eQTL (P = 2.13E-13, using Fisher's exact test) associated with adverse drug reactions to amitriptyline and nortriptyline and drug responses to morphine. Rs20455, located in KIF6, is an ancestry-informative eQTL (P = 2.76E-23, using Fisher's exact test) associated with the response to statin drugs (e.g., pravastatin and atorvastatin). The ancestry-informative eQTL of drug biotransformation genes were also observed; cross-population cis-acting expression regulators included SPG7, TAP2, SLC7A7, and CYP4F2. Finally, we also identified key ancestry-predictive eQTL and established classification models with promising training and testing accuracies in separating samples from close populations. CONCLUSIONS: In summary, we developed a gene-based PLS procedure and a SAS macro for identifying g-eQTL and s-eQTL. We established data archives of eQTL for global populations. The program and data archives are accessible at http://www.stat.sinica.edu.tw/hsinchou/genetics/eQTL/HapMapII.htm. Finally, the results from our investigations regarding the interrelationship between eQTL, ancestry information, and pharmacodynamics provide rich resources for future eQTL studies and practical applications in population genetics and medical genetics.
Asunto(s)
Genoma , Farmacogenética , Sitios de Carácter Cuantitativo , Humanos , Polimorfismo de Nucleótido SimpleRESUMEN
Juvenile male rhesus monkeys treated with methylphenidate hydrochloride (MPH) to evaluate genetic and behavioral toxicity were observed after 14 mo of treatment to have delayed pubertal progression with impaired testicular descent and reduced testicular volume. Further evaluation of animals dosed orally twice a day with (i) 0.5 mL/kg of vehicle (n = 10), (ii) 0.15 mg/kg of MPH increased to 2.5 mg/kg (low dose, n = 10), or (iii) 1.5 mg/kg of MPH increased to 12.5 mg/kg (high dose, n = 10) for a total of 40 mo revealed that testicular volume was significantly reduced (P < 0.05) at months 15 to 19 and month 27. Testicular descent was significantly delayed (P < 0.05) in the high-dose group. Significantly lower serum testosterone levels were detected in both the low- (P = 0.0017) and high-dose (P = 0.0011) animals through month 33 of treatment. Although serum inhibin B levels were increased overall in low-dose animals (P = 0.0328), differences between groups disappeared by the end of the study. Our findings indicate that MPH administration, beginning before puberty, and which produced clinically relevant blood levels of the drug, impaired pubertal testicular development until â¼5 y of age. It was not possible to resolve whether MPH delayed the initiation of the onset of puberty or reduced the early tempo of the developmental process. Regardless, deficits in testicular volume and hormone secretion disappeared over the 40-mo observation period, suggesting that the impact of MPH on puberty is not permanent.
Asunto(s)
Estimulantes del Sistema Nervioso Central/farmacología , Metilfenidato/farmacología , Maduración Sexual/efectos de los fármacos , Animales , Macaca mulatta , Masculino , Testículo/efectos de los fármacos , Testículo/crecimiento & desarrollo , Testosterona/sangreRESUMEN
The use of benchmark dose (BMD) calculations for dichotomous or continuous responses is well established in the risk assessment of cancer and noncancer endpoints. In some cases, responses to exposure are categorized in terms of ordinal severity effects such as none, mild, adverse, and severe. Such responses can be assessed using categorical regression (CATREG) analysis. However, while CATREG has been employed to compare the benchmark approach and the no-adverse-effect-level (NOAEL) approach in determining a reference dose, the utility of CATREG for risk assessment remains unclear. This study proposes a CATREG model to extend the BMD approach to ordered categorical responses by modeling severity levels as censored interval limits of a standard normal distribution. The BMD is calculated as a weighted average of the BMDs obtained at dichotomous cutoffs for each adverse severity level above the critical effect, with the weights being proportional to the reciprocal of the expected loss at the cutoff under the normal probability model. This approach provides a link between the current BMD procedures for dichotomous and continuous data. We estimate the CATREG parameters using a Markov chain Monte Carlo simulation procedure. The proposed method is demonstrated using examples of aldicarb and urethane, each with several categories of severity levels. Simulation studies comparing the BMD and BMDL (lower confidence bound on the BMD) using the proposed method to the correspondent estimates using the existing methods for dichotomous and continuous data are quite compatible; the difference is mainly dependent on the choice of cutoffs for the severity levels.
Asunto(s)
Sustancias Peligrosas/administración & dosificación , Sustancias Peligrosas/toxicidad , Medición de Riesgo/métodos , Aldicarb/administración & dosificación , Aldicarb/toxicidad , Animales , Benchmarking , Simulación por Computador , Relación Dosis-Respuesta a Droga , Etanol/administración & dosificación , Etanol/toxicidad , Femenino , Humanos , Masculino , Cadenas de Markov , Ratones , Modelos Biológicos , Modelos Estadísticos , Método de Montecarlo , Nivel sin Efectos Adversos Observados , Análisis de Regresión , Medición de Riesgo/estadística & datos numéricos , Uretano/administración & dosificación , Uretano/toxicidadRESUMEN
BACKGROUND: Lung cancer is the leading cause of cancer-related death worldwide. Tremendous research efforts have been devoted to improving treatment procedures, but the average five-year overall survival rates are still less than 20%. Many biomarkers have been identified for predicting survival; challenges arise, however, in translating the findings into clinical practice due to their inconsistency and irreproducibility. In this study, we proposed an approach by identifying predictive genes through pathways. RESULTS: The microarrays from Shedden et al. were used as the training set, and the log-rank test was performed to select potential signature genes. We focused on 24 cancer-related pathways from 4 biological databases. A scoring scheme was developed by the Cox hazard regression model, and patients were divided into two groups based on the medians. Subsequently, their predictability and generalizability were evaluated by the 2-fold cross-validation and a resampling test in 4 independent datasets, respectively. A set of 16 genes related to apoptosis execution was demonstrated to have good predictability as well as generalizability in more than 700 lung adenocarcinoma patients and was reproducible in 4 independent datasets. This signature set was shown to have superior performances compared to 6 other published signatures. Furthermore, the corresponding risk scores derived from the set were found to associate with the efficacy of the anti-cancer drug ZD-6474 targeting EGFR. CONCLUSIONS: In summary, we presented a new approach to identify reproducible survival predictors for lung adenocarcinoma, and the identified genes may serve as both prognostic and predictive biomarkers in the future.
Asunto(s)
Adenocarcinoma/genética , Neoplasias Pulmonares/genética , Adenocarcinoma del Pulmón , Anciano , Femenino , Regulación Neoplásica de la Expresión Génica , Biblioteca de Genes , Humanos , Neoplasias Pulmonares/tratamiento farmacológico , Neoplasias Pulmonares/patología , Masculino , Persona de Mediana Edad , Modelos de Riesgos Proporcionales , Reproducibilidad de los Resultados , Tasa de Supervivencia , TranscriptomaRESUMEN
BACKGROUND: Pulsed field gel electrophoresis (PFGE) is currently the most widely and routinely used method by the Centers for Disease Control and Prevention (CDC) and state health labs in the United States for Salmonella surveillance and outbreak tracking. Major drawbacks of commercially available PFGE analysis programs have been their difficulty in dealing with large datasets and the limited availability of analysis tools. There exists a need to develop new analytical tools for PFGE data mining in order to make full use of valuable data in large surveillance databases. RESULTS: In this study, a software package was developed consisting of five types of bioinformatics approaches exploring and implementing for the analysis and visualization of PFGE fingerprinting. The approaches include PFGE band standardization, Salmonella serotype prediction, hierarchical cluster analysis, distance matrix analysis and two-way hierarchical cluster analysis. PFGE band standardization makes it possible for cross-group large dataset analysis. The Salmonella serotype prediction approach allows users to predict serotypes of Salmonella isolates based on their PFGE patterns. The hierarchical cluster analysis approach could be used to clarify subtypes and phylogenetic relationships among groups of PFGE patterns. The distance matrix and two-way hierarchical cluster analysis tools allow users to directly visualize the similarities/dissimilarities of any two individual patterns and the inter- and intra-serotype relationships of two or more serotypes, and provide a summary of the overall relationships between user-selected serotypes as well as the distinguishable band markers of these serotypes. The functionalities of these tools were illustrated on PFGE fingerprinting data from PulseNet of CDC. CONCLUSIONS: The bioinformatics approaches included in the software package developed in this study were integrated with the PFGE database to enhance the data mining of PFGE fingerprints. Fast and accurate prediction makes it possible to elucidate Salmonella serotype information before conventional serological methods are pursued. The development of bioinformatics tools to distinguish the PFGE markers and serotype specific patterns will enhance PFGE data retrieval, interpretation and serotype identification and will likely accelerate source tracking to identify the Salmonella isolates implicated in foodborne diseases.
Asunto(s)
Biología Computacional/métodos , Electroforesis en Gel de Campo Pulsado/métodos , Salmonella/clasificación , Análisis por Conglomerados , Minería de Datos , Bases de Datos Genéticas , Humanos , Salmonella/química , Salmonella/genética , SerotipificaciónRESUMEN
BACKGROUND: The meninges (arachnoid and pial membranes) and associated vasculature (MAV) and choroid plexus are important in maintaining cerebrospinal fluid (CSF) generation and flow. MAV vasculature was previously observed to be adversely affected by environmentally-induced hyperthermia (EIH) and more so by a neurotoxic amphetamine (AMPH) exposure. Herein, microarray and RT-PCR analysis was used to compare the gene expression profiles between choroid plexus and MAV under control conditions and at 3 hours and 1 day after EIH or AMPH exposure. Since AMPH and EIH are so disruptive to vasculature, genes related to vasculature integrity and function were of interest. RESULTS: Our data shows that, under control conditions, many of the genes with relatively high expression in both the MAV and choroid plexus are also abundant in many epithelial tissues. These genes function in transport of water, ions, and solutes, and likely play a role in CSF regulation. Most genes that help form the blood-brain barrier (BBB) and tight junctions were also highly expressed in MAV but not in choroid plexus. In MAV, exposure to EIH and more so to AMPH decreased the expression of BBB-related genes such as Sox18, Ocln, and Cldn5, but they were much less affected in the choroid plexus. There was a correlation between the genes related to reactive oxidative stress and damage that were significantly altered in the MAV and choroid plexus after either EIH or AMPH. However, AMPH (at 3 hr) significantly affected about 5 times as many genes as EIH in the MAV, while in the choroid plexus EIH affected more genes than AMPH. Several unique genes that are not specifically related to vascular damage increased to a much greater extent after AMPH compared to EIH in the MAV (Lbp, Reg3a, Reg3b, Slc15a1, Sct and Fst) and choroid plexus (Bmp4, Dio2 and Lbp). CONCLUSIONS: Our study indicates that the disruption of choroid plexus function and damage produced by AMPH and EIH is significant, but the changes may not be as pronounced as they are in the MAV, particularly for AMPH. Expression profiles in the MAV and choroid plexus differed to some extent and differences were not restricted to vascular related genes.
Asunto(s)
Encéfalo/metabolismo , Líquido Cefalorraquídeo/metabolismo , Plexo Coroideo/metabolismo , Meninges/metabolismo , Anfetamina/toxicidad , Aracnoides/irrigación sanguínea , Aracnoides/metabolismo , Barrera Hematoencefálica/efectos de los fármacos , Barrera Hematoencefálica/metabolismo , Encéfalo/irrigación sanguínea , Plexo Coroideo/irrigación sanguínea , Plexo Coroideo/efectos de los fármacos , Ambiente , Fiebre , Humanos , Meninges/irrigación sanguínea , Meninges/efectos de los fármacos , Proteínas Asociadas a Pancreatitis , TranscriptomaRESUMEN
BACKGROUND: Two most important considerations in evaluation of survival prediction models are 1) predictability - ability to predict survival risks accurately and 2) reproducibility - ability to generalize to predict samples generated from different studies. We present approaches for assessment of reproducibility of survival risk score predictions across medical centers. METHODS: Reproducibility was evaluated in terms of consistency and transferability. Consistency is the agreement of risk scores predicted between two centers. Transferability from one center to another center is the agreement of the risk scores of the second center predicted by each of the two centers. The transferability can be: 1) model transferability - whether a predictive model developed from one center can be applied to predict the samples generated from other centers and 2) signature transferability - whether signature markers of a predictive model developed from one center can be applied to predict the samples from other centers. We considered eight prediction models, including two clinical models, two gene expression models, and their combinations. Predictive performance of the eight models was evaluated by several common measures. Correlation coefficients between predicted risk scores of different centers were computed to assess reproducibility - consistency and transferability. RESULTS: Two public datasets, the lung cancer data generated from four medical centers and colon cancer data generated from two medical centers, were analyzed. The risk score estimates for lung cancer patients predicted by three of four centers agree reasonably well. In general, a good prediction model showed better cross-center consistency and transferability. The risk scores for the colon cancer patients from one (Moffitt) medical center that were predicted by the clinical models developed from the another (Vanderbilt) medical center were shown to have excellent model transferability and signature transferability. CONCLUSIONS: This study illustrates an analytical approach to assessing reproducibility of predictive models and signatures. Based on the analyses of the two cancer datasets, we conclude that the models with clinical variables appear to perform reasonable well with high degree of consistency and transferability. There should have more investigations on the reproducibility of prediction models including gene expression data across studies.
Asunto(s)
Neoplasias del Colon/mortalidad , Interpretación Estadística de Datos , Neoplasias Pulmonares/mortalidad , Humanos , Valor Predictivo de las Pruebas , Pronóstico , Modelos de Riesgos Proporcionales , Reproducibilidad de los Resultados , Riesgo , Sobrevida , Resultado del TratamientoRESUMEN
Despite the fact that benefit-risk analysis is a necessary component of the review of new drugs for potential regulatory approval in the presence of known adverse side effects, and of the review of already-approved drugs for possible withdrawal from the market when unanticipated adverse events are discovered, formal quantitative tools for benefit-risk analysis are few. This paper proposes a quantitative method that utilizes receiver operating characteristic (ROC) curves to find an optimal dose of a drug that maximizes the differential between the benefit of the intended effect and the risk of adverse side effects, where costs associated with lack of benefit and risk can be incorporated. The method can be applied separately to subpopulations of different sensitivities and to different adverse events to give a full picture of the trade-offs between the benefit afforded by the drug and the risk it incurs, and potentially to allow the drug to be approved only selectively for specific subpopulations, or at different doses for different subpopulations.
Asunto(s)
Evaluación de Medicamentos/normas , Preparaciones Farmacéuticas/administración & dosificación , Preparaciones Farmacéuticas/normas , Curva ROC , Relación Dosis-Respuesta a Droga , Evaluación de Medicamentos/métodos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Medición de Riesgo/métodos , Medición de Riesgo/normas , Resultado del TratamientoRESUMEN
The Adverse Event Reporting System (AERS) is the primary database designed to support the Food and Drug Administration (FDA) postmarketing safety surveillance program for all approved drugs and therapeutic biologic products. Most current disproportionality analysis focuses on the detection of potential adverse events (AE) involving a single drug and a single AE only. In this paper, we present a data mining biclustering technique based on the singular value decomposition to extract local regions of association for a safety study. The analysis consists of collection of biclusters, each representing an association between a set of drugs with the corresponding set of adverse events. Significance of each bicluster can be tested using disproportionality analysis. Individual drug-event combination can be further tested. A safety data set consisting of 193 drugs with 8453 adverse events is analyzed as an illustration.
Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos , Minería de Datos/métodos , Bases de Datos Factuales , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Detección de Señal PsicológicaRESUMEN
This article proposes a method for multiclass classification problems using ensembles of multinomial logistic regression models. A multinomial logit model is used as a base classifier in ensembles from random partitions of predictors. The multinomial logit model can be applied to each mutually exclusive subset of the feature space without variable selection. By combining multiple models the proposed method can handle a huge database without a constraint needed for analyzing high-dimensional data, and the random partition can improve the prediction accuracy by reducing the correlation among base classifiers. The proposed method is implemented using R, and the performance including overall prediction accuracy, sensitivity, and specificity for each category is evaluated on two real data sets and simulation data sets. To investigate the quality of prediction in terms of sensitivity and specificity, the area under the receiver operating characteristic (ROC) curve (AUC) is also examined. The performance of the proposed model is compared to a single multinomial logit model and it shows a substantial improvement in overall prediction accuracy. The proposed method is also compared with other classification methods such as the random forest, support vector machines, and random multinomial logit model.