An efficient method to estimate the optimum regularization parameter in RLDA.

Bakir, Daniyar; James, Alex Pappachen; Zollanvari, Amin

Bakir, Daniyar; James, Alex Pappachen; Zollanvari, Amin.

Afiliação

Bakir D; Department of Electrical and Electronics Engineering, Nazarbayev University, Astana, 010000, Kazakhstan.
James AP; Department of Electrical and Electronics Engineering, Nazarbayev University, Astana, 010000, Kazakhstan.
Zollanvari A; Department of Electrical and Electronics Engineering, Nazarbayev University, Astana, 010000, Kazakhstan.

Bioinformatics ; 32(22): 3461-3468, 2016 11 15.

Article em En | MEDLINE | ID: mdl-27485443

RESUMO

MOTIVATION: The biomarker discovery process in high-throughput genomic profiles has presented the statistical learning community with a challenging problem, namely learning when the number of variables is comparable or exceeding the sample size. In these settings, many classical techniques including linear discriminant analysis (LDA) falter. Poor performance of LDA is attributed to the ill-conditioned nature of sample covariance matrix when the dimension and sample size are comparable. To alleviate this problem, regularized LDA (RLDA) has been classically proposed in which the sample covariance matrix is replaced by its ridge estimate. However, the performance of RLDA depends heavily on the regularization parameter used in the ridge estimate of sample covariance matrix. RESULTS: We propose a range-search technique for efficient estimation of the optimum regularization parameter. Using an extensive set of simulations based on synthetic and gene expression microarray data, we demonstrate the robustness of the proposed technique to Gaussianity, an assumption used in developing the core estimator. We compare the performance of the technique in terms of accuracy and efficiency with classical techniques for estimating the regularization parameter. In terms of accuracy, the results indicate that the proposed method vastly improves on similar techniques that use classical plug-in estimator. In that respect, it is better or comparable to cross-validation-based search strategies while, depending on the sample size and dimensionality, being tens to hundreds of times faster to compute. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/danik0411/optimum-rlda CONTACT: amin.zollanvari@nu.edu.kzSupplementary information: Supplementary materials are available at Bioinformatics online.

Assuntos

Algoritmos; Biomarcadores; Genômica; Animais; Biometria; Análise Discriminante; Humanos; Distribuição Normal; Tamanho da Amostra

Buscar no Google

Imprimir

XML

PubMed Links

Base de dados: MEDLINE Assunto principal: Algoritmos / Biomarcadores / Genômica Idioma: En Ano de publicação: 2016 Tipo de documento: Article País de afiliação: Cazaquistão

Buscar no Google

Imprimir

XML

PubMed Links