Búsqueda | BVS Bolivia

1.

Area under the curve-optimized synthesis of prediction models from a meta-analytical perspective.

Yoneoka, Daisuke; Omae, Katsuhiro; Henmi, Masayuki; Eguchi, Shinto.

Res Synth Methods ; 14(2): 234-246, 2023 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-36424356

RESUMEN

The number of clinical prediction models sharing the same prediction task has increased in the medical literature. However, evidence synthesis methodologies that use the results of these prediction models have not been sufficiently studied, particularly in the context of meta-analysis settings where only summary statistics are available. In particular, we consider the following situation: we want to predict an outcome Y, that is not included in our current data, while the covariate data are fully available. In addition, the summary statistics from prior studies, which share the same prediction task (i.e., the prediction of Y), are available. This study introduces a new method for synthesizing the summary results of binary prediction models reported in the prior studies using a linear predictor under a distributional assumption between the current and prior studies. The method provides an integrated predictor combining all predictors reported in the prior studies with weights. The vector of the weights is designed to achieve the hypothetical improvement of area under the receiver operating characteristic curve (AUC) on the current available data under a practical situation where there are different sets of covariates in the prior studies. We observe a counterintuitive aspect in typical situations where a part of weight components in the proposed method becomes negative. It implies that flipping the sign of the prediction results reported in each individual study would improve the overall prediction performance. Finally, numerical and real-world data analysis were conducted and showed that our method outperformed conventional methods in terms of AUC.

Asunto(s)

Reglas de Decisión Clínica , Modelos Estadísticos , Curva ROC

2.

Copula-based measures of asymmetry between the lower and upper tail probabilities.

Kato, Shogo; Yoshiba, Toshinao; Eguchi, Shinto.

Stat Pap (Berl) ; 63(6): 1907-1929, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35283558

RESUMEN

We propose a copula-based measure of asymmetry between the lower and upper tail probabilities of bivariate distributions. The proposed measure has a simple form and possesses some desirable properties as a measure of asymmetry. The limit of the proposed measure as the index goes to the boundary of its domain can be expressed in a simple form under certain conditions on copulas. A sample analogue of the proposed measure for a sample from a copula is presented and its weak convergence to a Gaussian process is shown. Another sample analogue of the presented measure, which is based on a sample from a distribution on R 2 , is given. Simple methods for interval and region estimation are presented. A simulation study is carried out to investigate the performance of the proposed sample analogues and methods for interval estimation. As an example, the presented measure is applied to daily returns of S&P500 and Nikkei225. A trivariate extension of the proposed measure and its sample analogue are briefly discussed. Supplementary Information: The online version contains supplementary material available at 10.1007/s00362-022-01297-w.

3.

Generalized quasi-linear mixed-effects model.

Saigusa, Yusuke; Eguchi, Shinto; Komori, Osamu.

Stat Methods Med Res ; 31(7): 1280-1291, 2022 07.

Artículo en Inglés | MEDLINE | ID: mdl-35286226

RESUMEN

The generalized linear mixed model (GLMM) is one of the most common method in the analysis of longitudinal and clustered data in biological sciences. However, issues of model complexity and misspecification can occur when applying the GLMM. To address these issues, we extend the standard GLMM to a nonlinear mixed-effects model based on quasi-linear modeling. An estimation algorithm for the proposed model is provided by extending the penalized quasi-likelihood and the restricted maximum likelihood which are known in the GLMM inference. Also, the conditional AIC is formulated for the proposed model. The proposed model should provide a more flexible fit than the GLMM when there is a nonlinear relation between fixed and random effects. Otherwise, the proposed model is reduced to the GLMM. The performance of the proposed model under model misspecification is evaluated in several simulation studies. In the analysis of respiratory illness data from a randomized controlled trial, we observe the proposed model can capture heterogeneity; that is, it can detect a patient subgroup with specific clinical character in which the treatment is effective.

Asunto(s)

Algoritmos , Modelos Lineales , Proyectos de Investigación , Simulación por Computador , Humanos , Funciones de Verosimilitud , Ensayos Clínicos Controlados Aleatorios como Asunto

4.

Novel robust time series analysis for long-term and short-term prediction.

Okamura, Hiroshi; Osada, Yutaka; Nishijima, Shota; Eguchi, Shinto.

Sci Rep ; 11(1): 11938, 2021 06 07.

Artículo en Inglés | MEDLINE | ID: mdl-34099758

RESUMEN

Nonlinear phenomena are universal in ecology. However, their inference and prediction are generally difficult because of autocorrelation and outliers. A traditional least squares method for parameter estimation is capable of improving short-term prediction by estimating autocorrelation, whereas it has weakness to outliers and consequently worse long-term prediction. In contrast, a traditional robust regression approach, such as the least absolute deviations method, alleviates the influence of outliers and has potentially better long-term prediction, whereas it makes accurately estimating autocorrelation difficult and possibly leads to worse short-term prediction. We propose a new robust regression approach that estimates autocorrelation accurately and reduces the influence of outliers. We then compare the new method with the conventional least squares and least absolute deviations methods by using simulated data and real ecological data. Simulations and analysis of real data demonstrate that the new method generally has better long-term and short-term prediction ability for nonlinear estimation problems using spawner-recruitment data. The new method provides nearly unbiased autocorrelation even for highly contaminated simulated data with extreme outliers, whereas other methods fail to estimate autocorrelation accurately.

5.

A Unified Formulation of k-Means, Fuzzy c-Means and Gaussian Mixture Model by the Kolmogorov-Nagumo Average.

Komori, Osamu; Eguchi, Shinto.

Entropy (Basel) ; 23(5)2021 Apr 24.

Artículo en Inglés | MEDLINE | ID: mdl-33923177

RESUMEN

Clustering is a major unsupervised learning algorithm and is widely applied in data mining and statistical data analyses. Typical examples include k-means, fuzzy c-means, and Gaussian mixture models, which are categorized into hard, soft, and model-based clusterings, respectively. We propose a new clustering, called Pareto clustering, based on the Kolmogorov-Nagumo average, which is defined by a survival function of the Pareto distribution. The proposed algorithm incorporates all the aforementioned clusterings plus maximum-entropy clustering. We introduce a probabilistic framework for the proposed method, in which the underlying distribution to give consistency is discussed. We build the minorize-maximization algorithm to estimate the parameters in Pareto clustering. We compare the performance with existing methods in simulation studies and in benchmark dataset analyses to demonstrate its highly practical utilities.

6.

Quasi-linear Cox proportional hazards model with cross- L₁ penalty.

Omae, Katsuhiro; Eguchi, Shinto.

BMC Med Res Methodol ; 20(1): 182, 2020 07 06.

Artículo en Inglés | MEDLINE | ID: mdl-32631280

RESUMEN

BACKGROUND: To accurately predict the response to treatment, we need a stable and effective risk score that can be calculated from patient characteristics. When we evaluate such risks from time-to-event data with right-censoring, Cox's proportional hazards model is the most popular for estimating the linear risk score. However, the intrinsic heterogeneity of patients may prevent us from obtaining a valid score. It is therefore insufficient to consider the regression problem with a single linear predictor. METHODS: we propose the model with a quasi-linear predictor that combines several linear predictors. This provides a natural extension of Cox model that leads to a mixture hazards model. We investigate the property of the maximum likelihood estimator for the proposed model. Moreover, we propose two strategies for getting the interpretable estimates. The first is to restrict the model structure in advance, based on unsupervised learning or prior information, and the second is to obtain as parsimonious an expression as possible in the parameter estimation strategy with cross- L1 penalty. The performance of the proposed method are evaluated by simulation and application studies. RESULTS: We showed that the maximum likelihood estimator has consistency and asymptotic normality, and the cross- L1-regularized estimator has root-n consistency. Simulation studies show these properties empirically, and application studies show that the proposed model improves predictive ability relative to Cox model. CONCLUSIONS: It is essential to capture the intrinsic heterogeneity of patients for getting more stable and effective risk score. The proposed hazard model can capture such heterogeneity and achieve better performance than the ordinary linear Cox proportional hazards model.

Asunto(s)

Proyectos de Investigación , Simulación por Computador , Humanos , Probabilidad , Modelos de Riesgos Proporcionales , Análisis de Supervivencia

7.

The power-integrated discriminant improvement: An accurate measure of the incremental predictive value of additional biomarkers.

Hayashi, Kenichi; Eguchi, Shinto.

Stat Med ; 38(14): 2589-2604, 2019 06 30.

Artículo en Inglés | MEDLINE | ID: mdl-30859601

RESUMEN

The predictive performance of biomarkers is a central concern in biomedical research. This is often evaluated by comparing two statistical models: a "new" model incorporating additional biomarkers and an "old" model without them. In 2008, the integrated discrimination improvement (IDI) was proposed for cases when the response variable is binary, and it is now widely applied as a promising alternative to conventional measures, such as the difference of the area under the receiver operating characteristic curve. However, the IDI can erroneously identify a significant improvement in the new model even if no additional information has been provided by new biomarkers. In order to overcome problems with existing measures, in this study, we propose the power-IDI as a measure of incremental predictive value. Our study explains why the IDI cannot avoid false detection of apparent improvements in a new model and we show that our proposed measure is better able to capture improvements in prediction. Numerical simulations and examples using real empirical data reveal that the power-IDI is not only more powerful but also incurs fewer false detections of improvement.

Asunto(s)

Biomarcadores , Modelos Estadísticos , Algoritmos , Área Bajo la Curva , Investigación Biomédica , Humanos , Modelos Logísticos , Curva ROC

8.

Diurnal Transcriptome and Gene Network Represented through Sparse Modeling in Brachypodium distachyon.

Koda, Satoru; Onda, Yoshihiko; Matsui, Hidetoshi; Takahagi, Kotaro; Yamaguchi-Uehara, Yukiko; Shimizu, Minami; Inoue, Komaki; Yoshida, Takuhiro; Sakurai, Tetsuya; Honda, Hiroshi; Eguchi, Shinto; Nishii, Ryuei; Mochida, Keiichi.

Front Plant Sci ; 8: 2055, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-29234348

RESUMEN

We report the comprehensive identification of periodic genes and their network inference, based on a gene co-expression analysis and an Auto-Regressive eXogenous (ARX) model with a group smoothly clipped absolute deviation (SCAD) method using a time-series transcriptome dataset in a model grass, Brachypodium distachyon. To reveal the diurnal changes in the transcriptome in B. distachyon, we performed RNA-seq analysis of its leaves sampled through a diurnal cycle of over 48 h at 4 h intervals using three biological replications, and identified 3,621 periodic genes through our wavelet analysis. The expression data are feasible to infer network sparsity based on ARX models. We found that genes involved in biological processes such as transcriptional regulation, protein degradation, and post-transcriptional modification and photosynthesis are significantly enriched in the periodic genes, suggesting that these processes might be regulated by circadian rhythm in B. distachyon. On the basis of the time-series expression patterns of the periodic genes, we constructed a chronological gene co-expression network and identified putative transcription factors encoding genes that might be involved in the time-specific regulatory transcriptional network. Moreover, we inferred a transcriptional network composed of the periodic genes in B. distachyon, aiming to identify genes associated with other genes through variable selection by grouping time points for each gene. Based on the ARX model with the group SCAD regularization using our time-series expression datasets of the periodic genes, we constructed gene networks and found that the networks represent typical scale-free structure. Our findings demonstrate that the diurnal changes in the transcriptome in B. distachyon leaves have a sparse network structure, demonstrating the spatiotemporal gene regulatory network over the cyclic phase transitions in B. distachyon diurnal growth.

9.

Quasi-linear score for capturing heterogeneous structure in biomarkers.

Omae, Katsuhiro; Komori, Osamu; Eguchi, Shinto.

BMC Bioinformatics ; 18(1): 308, 2017 Jun 19.

Artículo en Inglés | MEDLINE | ID: mdl-28629325

RESUMEN

BACKGROUND: Linear scores are widely used to predict dichotomous outcomes in biomedical studies because of their learnability and understandability. Such approaches, however, cannot be used to elucidate biodiversity when there is heterogeneous structure in target population. RESULTS: Our study was focused on describing intrinsic heterogeneity in predictions. Because heterogeneity can be captured by a clustering method, integrating different information from different clusters should yield better predictions. Accordingly, we developed a quasi-linear score, which effectively combines the linear scores of clustered markers. We extended the linear score to the quasi-linear score by a generalized average form, the Kolmogorov-Nagumo average. We observed that two shrinkage methods worked well: ridge shrinkage for estimating the quasi-linear score, and lasso shrinkage for selecting markers within each cluster. Simulation studies and applications to real data show that the proposed method has good predictive performance compared with existing methods. CONCLUSIONS: Heterogeneous structure is captured by a clustering method. Quasi-linear scores combine such heterogeneity and have a better predictive ability compared with linear scores.

Asunto(s)

Algoritmos , Biomarcadores/metabolismo , Biomarcadores/análisis , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Análisis por Conglomerados , Análisis Discriminante , Femenino , Humanos , Funciones de Verosimilitud , Modelos Logísticos , Metástasis de la Neoplasia , Transcriptoma

10.

Reproducible detection of disease-associated markers from gene expression data.

Omae, Katsuhiro; Komori, Osamu; Eguchi, Shinto.

BMC Med Genomics ; 9(1): 53, 2016 08 18.

Artículo en Inglés | MEDLINE | ID: mdl-27538512

RESUMEN

BACKGROUND: Detection of disease-associated markers plays a crucial role in gene screening for biological studies. Two-sample test statistics, such as the t-statistic, are widely used to rank genes based on gene expression data. However, the resultant gene ranking is often not reproducible among different data sets. Such irreproducibility may be caused by disease heterogeneity. RESULTS: When we divided data into two subsets, we found that the signs of the two t-statistics were often reversed. Focusing on such instability, we proposed a sign-sum statistic that counts the signs of the t-statistics for all possible subsets. The proposed method excludes genes affected by heterogeneity, thereby improving the reproducibility of gene ranking. We compared the sign-sum statistic with the t-statistic by a theoretical evaluation of the upper confidence limit. Through simulations and applications to real data sets, we show that the sign-sum statistic exhibits superior performance. CONCLUSION: We derive the sign-sum statistic for getting a robust gene ranking. The sign-sum statistic gives more reproducible ranking than the t-statistic. Using simulated data sets we show that the sign-sum statistic excludes hetero-type genes well. Also for the real data sets, the sign-sum statistic performs well in a viewpoint of ranking reproducibility.

Asunto(s)

Biología Computacional/métodos , Enfermedad/genética , Perfilación de la Expresión Génica , Biomarcadores/metabolismo , Humanos , Reproducibilidad de los Resultados

11.

Risk assessment of radioisotope contamination for aquatic living resources in and around Japan.

Okamura, Hiroshi; Ikeda, Shiro; Morita, Takami; Eguchi, Shinto.

Proc Natl Acad Sci U S A ; 113(14): 3838-43, 2016 Apr 05.

Artículo en Inglés | MEDLINE | ID: mdl-26929347

RESUMEN

Food contamination caused by radioisotopes released from the Fukushima Dai-ichi nuclear power plant is of great public concern. The contamination risk for food items should be estimated depending on the characteristics and geographic environments of each item. However, evaluating current and future risk for food items is generally difficult because of small sample sizes, high detection limits, and insufficient survey periods. We evaluated the risk for aquatic food items exceeding a threshold of the radioactive cesium in each species and location using a statistical model. Here we show that the overall contamination risk for aquatic food items is very low. Some freshwater biota, however, are still highly contaminated, particularly in Fukushima. Highly contaminated fish generally tend to have large body size and high trophic levels.

Asunto(s)

Radioisótopos de Cesio/análisis , Peces , Contaminación Radiactiva de Alimentos/análisis , Accidente Nuclear de Fukushima , Monitoreo de Radiación , Contaminantes Radiactivos del Agua/análisis , Animales , Tamaño Corporal , Japón , Plantas de Energía Nuclear , Medición de Riesgo

12.

Robust Clustering Method in the Presence of Scattered Observations.

Notsu, Akifumi; Eguchi, Shinto.

Neural Comput ; 28(6): 1141-62, 2016 06.

Artículo en Inglés | MEDLINE | ID: mdl-26942745

RESUMEN

Contamination of scattered observations, which are either featureless or unlike the other observations, frequently degrades the performance of standard methods such as K-means and model-based clustering. In this letter, we propose a robust clustering method in the presence of scattered observations called Gamma-clust. Gamma-clust is based on a robust estimation for cluster centers using gamma-divergence. It provides a proper solution for clustering in which the distributions for clustered data are nonnormal, such as t-distributions with different variance-covariance matrices and degrees of freedom. As demonstrated in a simulation study and data analysis, Gamma-clust is more flexible and provides superior results compared to the robustified K-means and model-based clustering.

13.

Generalized t-statistic for two-group classification.

Komori, Osamu; Eguchi, Shinto; Copas, John B.

Biometrics ; 71(2): 404-16, 2015 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-25359078

RESUMEN

In the classic discriminant model of two multivariate normal distributions with equal variance matrices, the linear discriminant function is optimal both in terms of the log likelihood ratio and in terms of maximizing the standardized difference (the t-statistic) between the means of the two distributions. In a typical case-control study, normality may be sensible for the control sample but heterogeneity and uncertainty in diagnosis may suggest that a more flexible model is needed for the cases. We generalize the t-statistic approach by finding the linear function which maximizes a standardized difference but with data from one of the groups (the cases) filtered by a possibly nonlinear function U. We study conditions for consistency of the method and find the function U which is optimal in the sense of asymptotic efficiency. Optimality may also extend to other measures of discriminatory efficiency such as the area under the receiver operating characteristic curve. The optimal function U depends on a scalar probability density function which can be estimated non-parametrically using a standard numerical algorithm. A lasso-like version for variable selection is implemented by adding L1-regularization to the generalized t-statistic. Two microarray data sets in the study of asthma and various cancers are used as motivating examples.

Asunto(s)

Análisis Discriminante , Alérgenos , Asma/inmunología , Biometría , Estudios de Casos y Controles , Humanos , Funciones de Verosimilitud , Modelos Lineales , Modelos Estadísticos , Análisis Multivariante , Curva ROC , Estadísticas no Paramétricas

14.

Individualized prostate-specific antigen threshold values to avoid overdiagnosis of prostate cancer and reduce unnecessary biopsy in elderly men.

Kanao, Kent; Komori, Osamu; Nakashima, Jun; Ohigashi, Takashi; Kikuchi, Eiji; Miyajima, Akira; Nakagawa, Ken; Eguchi, Shinto; Oya, Mototsugu.

Jpn J Clin Oncol ; 44(9): 852-9, 2014 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-25030213

RESUMEN

OBJECTIVE: To individualize prostate-specific antigen threshold values to avoid overdiagnosis of prostate cancer and reduce unnecessary biopsy in elderly men. METHODS: A total of 406 men aged over 70 years old with prostate-specific antigen levels between 4.0 and 20.0 ng/ml, normal digital rectal examination results and diagnosed by transrectal needle biopsy were retrospectively analyzed. The patients were divided into a no/favorable-risk cancer group or an unfavorable-risk cancer group based on their Gleason score and the number of positive cores. Prostate-specific antigen levels, percent free prostate-specific antigen level, prostate transition zone volume and the number of previous biopsies were used to discriminate between the two groups. The optimal individualized prostate-specific antigen threshold values based on the other variables that gave a sensitivity of 95% for the detection of unfavorable-risk cancer were calculated using a boosting method for maximizing the area under the receiver operating characteristic curve. RESULTS: A total of 66 men had favorable-risk cancer, and 139 had unfavorable-risk cancer. The area under the receiver operating characteristic curve of the combination model determined by the boosting method for maximizing the area under the receiver operating characteristic curve was 0.852. The sensitivity and specificity of the threshold values for the detection of unfavorable-risk cancer were 95 and 36%, respectively. By using the threshold values, 100 (25%) of the subjects with no/favorable-risk cancer could have avoided undergoing biopsies, with a <5% risk of missing the detection of unfavorable-risk cancer. CONCLUSIONS: These individualized prostate-specific antigen threshold values may be useful for determining an indication of prostate biopsy for elderly men to avoid overdiagnosis of prostate cancer and reduce unnecessary biopsy.

Asunto(s)

Biomarcadores de Tumor/sangre , Biopsia con Aguja , Medicina de Precisión/métodos , Antígeno Prostático Específico/sangre , Neoplasias de la Próstata/sangre , Neoplasias de la Próstata/diagnóstico , Anciano , Anciano de 80 o más Años , Área Bajo la Curva , Humanos , Masculino , Neoplasias de la Próstata/cirugía , Curva ROC , Estudios Retrospectivos , Medición de Riesgo , Sensibilidad y Especificidad , Procedimientos Innecesarios/tendencias

15.

Spontaneous clustering via minimum Î³-divergence.

Notsu, Akifumi; Komori, Osamu; Eguchi, Shinto.

Neural Comput ; 26(2): 421-48, 2014 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-24206383

RESUMEN

We propose a new method for clustering based on local minimization of the gamma-divergence, which we call spontaneous clustering. The greatest advantage of the proposed method is that it automatically detects the number of clusters that adequately reflect the data structure. In contrast, existing methods, such as K-means, fuzzy c-means, or model-based clustering need to prescribe the number of clusters. We detect all the local minimum points of the gamma-divergence, by which we define the cluster centers. A necessary and sufficient condition for the gamma-divergence to have local minimum points is also derived in a simple setting. Applications to simulated and real data are presented to compare the proposed method with existing ones.

Asunto(s)

Análisis por Conglomerados , Lógica Difusa , Reconocimiento de Normas Patrones Automatizadas , Reconocimiento de Normas Patrones Automatizadas/métodos

16.

Statistical analysis of biomarkers for personalized medicine.

Eguchi, Shinto; Matsui, Shigeyuki; Huang, Su-Yun; Hsiao, Chuhsing Kate.

Comput Math Methods Med ; 2013: 467420, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-24307913

Asunto(s)

Biomarcadores/análisis , Medicina de Precisión/métodos , Bioestadística , Marcadores Genéticos , Humanos , Medicina de Precisión/estadística & datos numéricos

17.

Multiple suboptimal solutions for prediction rules in gene expression data.

Komori, Osamu; Pritchard, Mari; Eguchi, Shinto.

Comput Math Methods Med ; 2013: 798189, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23662163

RESUMEN

This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems due to the unbalance in the number of observed genes compared with the number of observed subjects. We make a reanalysis of microarray gene expression published data to detect many other gene sets with almost the same performance. We conclude in the current stage that it is not possible to extract only informative genes with high performance in the all observed genes. We investigate the reason why this difficulty still exists even though there are actively proposed analysis methods and learning algorithms in statistical machine learning approaches. We focus on the mutual coherence or the absolute value of the Pearson correlations between two genes and describe the distributions of the correlation for the selected set of genes and the total set. We show that the problem of finding informative genes in high dimensional data is ill-posed and that the difficulty is closely related with the mutual coherence.

Asunto(s)

Perfilación de la Expresión Génica/estadística & datos numéricos , Algoritmos , Inteligencia Artificial , Neoplasias de la Mama/genética , Análisis por Conglomerados , Biología Computacional , Bases de Datos Genéticas/estadística & datos numéricos , Femenino , Humanos , Modelos Estadísticos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Reconocimiento de Normas Patrones Automatizadas/estadística & datos numéricos , Fenotipo

18.

An extension of the receiver operating characteristic curve and AUC-optimal classification.

Takenouchi, Takashi; Komori, Osamu; Eguchi, Shinto.

Neural Comput ; 24(10): 2789-824, 2012 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-22734493

RESUMEN

While most proposed methods for solving classification problems focus on minimization of the classification error rate, we are interested in the receiver operating characteristic (ROC) curve, which provides more information about classification performance than the error rate does. The area under the ROC curve (AUC) is a natural measure for overall assessment of a classifier based on the ROC curve. We discuss a class of concave functions for AUC maximization in which a boosting-type algorithm including RankBoost is considered, and the Bayesian risk consistency and the lower bound of the optimum function are discussed. A procedure derived by maximizing a specific optimum function has high robustness, based on gross error sensitivity. Additionally, we focus on the partial AUC, which is the partial area under the ROC curve. For example, in medical screening, a high true-positive rate to the fixed lower false-positive rate is preferable and thus the partial AUC corresponding to lower false-positive rates is much more important than the remaining AUC. We extend the class of concave optimum functions for partial AUC optimality with the boosting algorithm. We investigated the validity of the proposed method through several experiments with data sets in the UCI repository.

Asunto(s)

Área Bajo la Curva , Solución de Problemas , Curva ROC , Algoritmos , Humanos , Reconocimiento de Normas Patrones Automatizadas

19.

Robust QTL analysis by minimum beta-divergence method.

Mollah, Md Nurul Haque; Eguchi, Shinto.

Int J Data Min Bioinform ; 4(4): 471-85, 2010.

Artículo en Inglés | MEDLINE | ID: mdl-20815143

RESUMEN

Robustness has received too little attention in Quantitative Trait Loci (QTL) analysis in experimental crosses. This paper discusses a robust QTL mapping algorithm based on Composite Interval Mapping (CIM) model by minimising beta-divergence using the EM like algorithm. We investigate the robustness performance of the proposed method in a comparison of Interval Mapping (IM) and CIM algorithms using both synthetic and real datasets. Experimental results show that the proposed method significantly improves the performance over the traditional IM and CIM methods for QTL analysis in presence of outliers; otherwise, it keeps equal performance.

Asunto(s)

Algoritmos , Sitios de Carácter Cuantitativo , Mapeo Cromosómico , Cruzamientos Genéticos

20.

A boosting method for maximizing the partial area under the ROC curve.

Komori, Osamu; Eguchi, Shinto.

BMC Bioinformatics ; 11: 314, 2010 Jun 10.

Artículo en Inglés | MEDLINE | ID: mdl-20537139

RESUMEN

BACKGROUND: The receiver operating characteristic (ROC) curve is a fundamental tool to assess the discriminant performance for not only a single marker but also a score function combining multiple markers. The area under the ROC curve (AUC) for a score function measures the intrinsic ability for the score function to discriminate between the controls and cases. Recently, the partial AUC (pAUC) has been paid more attention than the AUC, because a suitable range of the false positive rate can be focused according to various clinical situations. However, existing pAUC-based methods only handle a few markers and do not take nonlinear combination of markers into consideration. RESULTS: We have developed a new statistical method that focuses on the pAUC based on a boosting technique. The markers are combined componentially for maximizing the pAUC in the boosting algorithm using natural cubic splines or decision stumps (single-level decision trees), according to the values of markers (continuous or discrete). We show that the resulting score plots are useful for understanding how each marker is associated with the outcome variable. We compare the performance of the proposed boosting method with those of other existing methods, and demonstrate the utility using real data sets. As a result, we have much better discrimination performances in the sense of the pAUC in both simulation studies and real data analysis. CONCLUSIONS: The proposed method addresses how to combine the markers after a pAUC-based filtering procedure in high dimensional setting. Hence, it provides a consistent way of analyzing data based on the pAUC from maker selection to marker combination for discrimination problems. The method can capture not only linear but also nonlinear association between the outcome variable and the markers, about which the nonlinearity is known to be necessary in general for the maximization of the pAUC. The method also puts importance on the accuracy of classification performance as well as interpretability of the association, by offering simple and smooth resultant score plots for each marker.

Asunto(s)

Reconocimiento de Normas Patrones Automatizadas/métodos , Curva ROC , Algoritmos , Área Bajo la Curva , Biomarcadores/química

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA