Upper bound of the expected training error of neural network regression for a Gaussian noise sequence.

Hagiwara, K; Hayasaka, T; Toda, N; Usui, S; Kuno, K

Hagiwara, K; Hayasaka, T; Toda, N; Usui, S; Kuno, K.

Afiliação

Hagiwara K; Faculty of Physics Engineering, Mie University, Tsu, Japan. hagi@phen.mie-u.ac.jp

Neural Netw ; 14(10): 1419-29, 2001 Dec.

Article em En | MEDLINE | ID: mdl-11771721

ABSTRACT

ABSTRACT

In neural network regression problems, often referred to as additive noise models, NIC (Network Information Criterion) has been proposed as a general model selection criterion to determine the optimal network size with high generalization performance. Although NIC has been derived using asymptotic expansion, it has been pointed out that this technique cannot be applied under the assumption that a target function is in a family of assumed networks and the family is not minimal for representing the target true function, i.e. the overrealizable case, in which NIC reduces to the well-known AIC (Akaike Information Criterion) and others depending on a loss function. Because NIC is the unbiased estimator of generalization error based on training error, it is required to derive the expectations of errors for neural networks for such cases. This paper gives upper bounds of the expectations of training errors with respect to the distribution of training data, which we call the expected training error, for some types of networks under the squared error loss. In the overrealizable case, because the errors are determined by fitting properties of networks to noise components, including in data, the target set of data is taken to be a Gaussian noise sequence. For radial basis function networks and 3-layered neural networks with bell shaped activation function in the hidden layer, the expected training error is bounded above by sigma2* - 2nsigma2*logT/T, where sigma2* is the variance of noise, n is the number of basis functions or the number of hidden units and T is the number of data. Furthermore, for 3-layered neural networks with sigmoidal activation function in the hidden layer, we obtained the upper bound of sigma2* - O(log T/T) when n > 2. If the number of data is large enough, these bounds of the expected training error are smaller than sigma2* - N(n)sigma2*/T as evaluated in NIC, where N(n) is the number of all network parameters.

Assuntos

Redes Neurais de Computação; Distribuição Normal; Análise de Variância; Funções Verossimilhança; Análise de Regressão; Projetos de Pesquisa

Buscar no Google

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Distribuição Normal / Redes Neurais de Computação Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Revista: Neural Netw Assunto da revista: NEUROLOGIA Ano de publicação: 2001 Tipo de documento: Article País de afiliação: Japão

Buscar no Google

Adicionar na Minha BVS

Imprimir

XML

PubMed Links