RESUMEN
This paper aims to evaluate the statistical association between exposure to air pollution and forced expiratory volume in the first second (FEV1) in both asthmatic and non-asthmatic children and teenagers, in which the response variable FEV1 was repeatedly measured on a monthly basis, characterizing a longitudinal experiment. Due to the nature of the data, an robust linear mixed model (RLMM), combined with a robust principal component analysis (RPCA), is proposed to handle the multicollinearity among the covariates and the impact of extreme observations (high levels of air contaminants) on the estimates. The Huber and Tukey loss functions are considered to obtain robust estimators of the parameters in the linear mixed model (LMM). A finite sample size investigation is conducted under the scenario where the covariates follow linear time series models with and without additive outliers (AO). The impact of the time-correlation and the outliers on the estimates of the fixed effect parameters in the LMM is investigated. In the real data analysis, the robust model strategy evidenced that RPCA exhibits three principal component (PC), mainly related to relative humidity (Hmd), particulate matter with a diameter smaller than 10 µm (PM10) and particulate matter with a diameter smaller than 2.5 µm (PM2.5).
RESUMEN
BACKGROUND: Several strategies for identifying biologically implausible values in longitudinal anthropometric data have recently been proposed, but the suitability of these strategies for large population datasets needs to be better understood. This study evaluated the impact of removing population outliers and the additional value of identifying and removing longitudinal outliers on the trajectories of length/height and weight and on the prevalence of child growth indicators in a large longitudinal dataset of child growth data. METHODS: Length/height and weight measurements of children aged 0 to 59 months from the Brazilian Food and Nutrition Surveillance System were analyzed. Population outliers were identified using z-scores from the World Health Organization (WHO) growth charts. After identifying and removing population outliers, residuals from linear mixed-effects models were used to flag longitudinal outliers. The following cutoffs for residuals were tested to flag those: -3/+3, -4/+4, -5/+5, -6/+6. The selected child growth indicators included length/height-for-age z-scores and weight-for-age z-scores, classified according to the WHO charts. RESULTS: The dataset included 50,154,738 records from 10,775,496 children. Boys and girls had 5.74% and 5.31% of length/height and 5.19% and 4.74% of weight values flagged as population outliers, respectively. After removing those, the percentage of longitudinal outliers varied from 0.02% (<-6/>+6) to 1.47% (<-3/>+3) for length/height and from 0.07 to 1.44% for weight in boys. In girls, the percentage of longitudinal outliers varied from 0.01 to 1.50% for length/height and from 0.08 to 1.45% for weight. The initial removal of population outliers played the most substantial role in the growth trajectories as it was the first step in the cleaning process, while the additional removal of longitudinal outliers had lower influence on those, regardless of the cutoff adopted. The prevalence of the selected indicators were also affected by both population and longitudinal (to a lesser extent) outliers. CONCLUSIONS: Although both population and longitudinal outliers can detect biologically implausible values in child growth data, removing population outliers seemed more relevant in this large administrative dataset, especially in calculating summary statistics. However, both types of outliers need to be identified and removed for the proper evaluation of trajectories.
Asunto(s)
Estatura , Gráficos de Crecimiento , Niño , Masculino , Femenino , Humanos , Peso Corporal , Brasil/epidemiología , AntropometríaRESUMEN
Analysing data from educational tests allows governments to make decisions for improving the quality of life of individuals in a society. One of the key responsibilities of statisticians is to develop models that provide decision-makers with pertinent information about the latent process that educational tests seek to represent. Mixtures of t $$ t $$ factor analysers (MtFA) have emerged as a powerful device for model-based clustering and classification of high-dimensional data containing one or several groups of observations with fatter tails or anomalous outliers. This paper considers an extension of MtFA for robust clustering of censored data, referred to as the MtFAC model, by incorporating external covariates. The enhanced flexibility of including covariates in MtFAC enables cluster-specific multivariate regression analysis of dependent variables with censored responses arising from upper and/or lower detection limits of experimental equipment. An alternating expectation conditional maximization (AECM) algorithm is developed for maximum likelihood estimation of the proposed model. Two simulation experiments are conducted to examine the effectiveness of the techniques presented. Furthermore, the proposed methodology is applied to Peruvian data from the 2007 Early Grade Reading Assessment, and the results obtained from the analysis provide new insights regarding the reading skills of Peruvian students.
Asunto(s)
Algoritmos , Calidad de Vida , Humanos , Funciones de Verosimilitud , Perú , Análisis Multivariante , Simulación por ComputadorRESUMEN
This study aims to examine the effects of the underlying population distribution (normal, non-normal) and OLs on the magnitude of Pearson, Spearman and Pearson Winzorized correlation coefficients through Monte Carlo simulation. The study is conducted using Monte Carlo simulation methodology, with sample sizes of 50, 100, 250, 250, 500 and 1000 observations. Each, underlying population correlations of 0.12, 0.20, 0.31 and 0.50 under conditions of bivariate Normality, bivariate Normality with Outliers (discordant, contaminants) and Non-normal with different values of skewness and kurtosis. The results show that outliers have a greater effect compared to the data distributions; specifically, a substantial effect occurs in Pearson and a smaller one in Spearman and Pearson Winzorized. Additionally, the outliers are shown to have an impact on the assessment of bivariate normality using Mardia's test and problems with decisions based on skewness and kurtosis for univariate normality. Implications of the results obtained are discussed.
RESUMEN
Based on case studies, in this chapter we discuss the extent to which the number and identity of quantitative trait loci (QTL) identified from genome-wide association studies (GWAS) are affected by curation and analysis of phenotypic data. The chapter demonstrates through examples the impact of (1) cleaning of outliers, and of (2) the choice of statistical method for estimating genotypic mean values of phenotypic inputs in GWAS. No cleaning of outliers resulted in the highest number of dubious QTL, especially at loci with highly unbalanced allelic frequencies. A trade-off was identified between the risk of false positives and the risk of missing interesting, yet rare alleles. The choice of the statistical method to estimate genotypic mean values also affected the output of GWAS analysis, with reduced QTL overlap between methods. Using mixed models that capture spatial trends, among other features, increased the narrow-sense heritability of traits, the number of identified QTL and the overall power of GWAS analysis. Cleaning and choosing robust statistical models for estimating genotypic mean values should be included in GWAS pipelines to decrease both false positive and false negative rates of QTL detection.
Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Alelos , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo/métodos , Sitios de Carácter CuantitativoRESUMEN
The determination of The Radial Basis Function Network centers is an open problem. This work determines the cluster centers by a proposed gradient algorithm, using the information forces acting on each data point. These centers are applied to a Radial Basis Function Network for data classification. A threshold is established based on Information Potential to classify the outliers. The proposed algorithms are analysed based on databases considering the number of clusters, overlap of clusters, noise, and unbalance of cluster sizes. Combined, the threshold, and the centers determined by information forces, show good results in comparison to a similar Network with a k-means clustering algorithm.
RESUMEN
BACKGROUND: Universal health coverage promises equity in access to and quality of health services. However, there is variability in the quality of the care (QoC) delivered at health facilities in low and middle-income countries (LMICs). Detecting gaps in implementation of clinical guidelines is key to prioritizing the efforts to improve quality of care. The aim of this study was to present statistical methods that maximize the use of existing electronic medical records (EMR) to monitor compliance with evidence-based care guidelines in LMICs. METHODS: We used iSanté, Haiti's largest EMR to assess adherence to treatment guidelines and retention on treatment of HIV patients across Haitian HIV care facilities. We selected three processes of care - (1) implementation of a 'test and start' approach to antiretroviral therapy (ART), (2) implementation of HIV viral load testing, and (3) uptake of multi-month scripting for ART, and three continuity of care indicators - (4) timely ART pick-up, (5) 6-month ART retention of pregnant women and (6) 6-month ART retention of non-pregnant adults. We estimated these six indicators using a model-based approach to account for their volatility and measurement error. We added a case-mix adjustment for continuity of care indicators to account for the effect of factors other than medical care (biological, socio-economic). We combined the six indicators in a composite measure of appropriate care based on adherence to treatment guidelines. RESULTS: We analyzed data from 65,472 patients seen in 89 health facilities between June 2016 and March 2018. Adoption of treatment guidelines differed greatly between facilities; several facilities displayed 100% compliance failure, suggesting implementation issues. Risk-adjusted continuity of care indicators showed less variability, although several facilities had patient retention rates that deviated significantly from the national average. Based on the composite measure, we identified two facilities with consistently poor performance and two star performers. CONCLUSIONS: Our work demonstrates the potential of EMRs to detect gaps in appropriate care processes, and thereby to guide quality improvement efforts. Closing quality gaps will be pivotal in achieving equitable access to quality care in LMICs.
Asunto(s)
Registros Electrónicos de Salud , Adhesión a Directriz/estadística & datos numéricos , Infecciones por VIH/tratamiento farmacológico , Guías de Práctica Clínica como Asunto , Mejoramiento de la Calidad/organización & administración , Adulto , Fármacos Anti-VIH/uso terapéutico , Femenino , Haití , Instituciones de Salud/normas , Investigación sobre Servicios de Salud , Humanos , Masculino , Persona de Mediana Edad , Embarazo , Adulto JovenRESUMEN
Zero adjusted regression models are used to fit variables that are discrete at zero and continuous at some interval of the positive real numbers. Diagnostic analysis in these models is usually performed using the randomized quantile residual, which is useful for checking the overall adequacy of a zero adjusted regression model. However, it may fail to identify some outliers. In this work, we introduce a class of residuals for outlier identification in zero adjusted regression models. Monte Carlo simulation studies and two applications suggest that one of the residuals of the class introduced here has good properties and detects outliers that are not identified by the randomized quantile residual.
RESUMEN
Geodetic networks provide accurate three-dimensional control points for mapping activities, geoinformation, and infrastructure works. Accurate computation and adjustment are necessary, as all data collection is vulnerable to outliers. Applying a Least Squares (LS) process can lead to inaccuracy over many points in such conditions. Robust Estimator (RE) methods are less sensitive to outliers and provide an alternative to conventional LS. To solve the RE functions, we propose a new metaheuristic (MH), based on the Vortex Search (IVS) algorithm, along with a novel search space definition scheme. Numerous scenarios for a Global Navigation Satellite Systems (GNSS)-based network are generated to compare and analyze the behavior of several known REs. A classic iterative RE and an LS process are also tested for comparison. We analyze the median and trim position of several estimators, in order to verify their impact on the estimates. The tests show that IVS performs better than the original algorithm; therefore, we adopted it in all subsequent RE computations. Regarding network adjustments, outcomes in the parameter estimation show that REs achieve better results in large-scale outliers' scenarios. For detection, both LS and REs identify most outliers in schemes with large outliers.
RESUMEN
This paper proposes the use of random forest for adulteration detection purposes, combining the random forest algorithm with the artificial generation of outliers from the authentic samples. This proposal was applied in two food adulteration studies: evening primrose oils using ATR-FTIR spectroscopy and ground nutmeg using NIR diffuse reflectance spectroscopy. The primrose oil was adulterated with soybean, corn and sunflower oils, and the model was validated using these adulterated oils and other different oils, such as rosehip and andiroba, in pure and adulterated forms. The ground nutmeg was adulterated with cumin, commercial monosodium glutamate, soil, roasted coffee husks and wood sawdust. For the primrose oil, the proposed method presented superior performance than PLS-DA and similar performance to SIMCA and for the ground nutmeg, the random forest was superior to PLS-DA and SIMCA. Also, in both applications using the random forest, no sample was excluded from the external validation set.
Asunto(s)
Contaminación de Alimentos/análisis , Ácidos Linoleicos/química , Aceites de Plantas/química , Espectroscopía Infrarroja por Transformada de Fourier/métodos , Ácido gammalinolénico/química , Aceite de Maíz/análisis , Límite de Detección , Myristica/química , Oenothera biennis , Aceite de Soja/análisis , Aceite de Girasol/análisisRESUMEN
Sparse coding aims to find a parsimonious representation of an example given an observation matrix or dictionary. In this regard, Orthogonal Matching Pursuit (OMP) provides an intuitive, simple and fast approximation of the optimal solution. However, its main building block is anchored on the minimization of the Mean Squared Error cost function (MSE). This approach is only optimal if the errors are distributed according to a Gaussian distribution without samples that strongly deviate from the main mode, i.e. outliers. If such assumption is violated, the sparse code will likely be biased and performance will degrade accordingly. In this paper, we introduce five robust variants of OMP (RobOMP) fully based on the theory of M-Estimators under a linear model. The proposed framework exploits efficient Iteratively Reweighted Least Squares (IRLS) techniques to mitigate the effect of outliers and emphasize the samples corresponding to the main mode of the data. This is done adaptively via a learned weight vector that models the distribution of the data in a robust manner. Experiments on synthetic data under several noise distributions and image recognition under different combinations of occlusion and missing pixels thoroughly detail the superiority of RobOMP over MSE-based approaches and similar robust alternatives. We also introduce a denoising framework based on robust, sparse and redundant representations that open the door to potential further applications of the proposed techniques. The five different variants of RobOMP do not require parameter tuning from the user and, hence, constitute principled alternatives to OMP.
RESUMEN
The risk of malaria infection displays spatial and temporal variability that is likely due to interaction between the physical environment and the human population. In this study, we performed a spatial analysis at three different time points, corresponding to three cross-sectional surveys conducted as part of an insecticide-treated bed nets efficacy study, to reveal patterns of malaria incidence distribution in an area of Northern Guatemala characterized by low malaria endemicity. A thorough understanding of the spatial and temporal patterns of malaria distribution is essential for targeted malaria control programs. Two methods, the local Moran's I and the Getis-Ord G*(d), were used for the analysis, providing two different statistical approaches and allowing for a comparison of results. A distance band of 3.5 km was considered to be the most appropriate distance for the analysis of data based on epidemiological and entomological factors. Incidence rates were higher at the first cross-sectional survey conducted prior to the intervention compared to the following two surveys. Clusters or hot spots of malaria incidence exhibited high spatial and temporal variations. Findings from the two statistics were similar, though the G*(d) detected cold spots using a higher distance band (5.5 km). The high spatial and temporal variability in the distribution of clusters of high malaria incidence seems to be consistent with an area of unstable malaria transmission. In such a context, a strong surveillance system and the use of spatial analysis may be crucial for targeted malaria control activities.
Asunto(s)
Análisis por Conglomerados , Malaria/epidemiología , Malaria/transmisión , Análisis Espacial , Estudios Transversales , Ambiente , Guatemala/epidemiología , Humanos , Incidencia , Mosquiteros Tratados con Insecticida , Malaria/prevención & control , Estaciones del AñoRESUMEN
In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.
Asunto(s)
Modelos Lineales , Síndrome de Inmunodeficiencia Adquirida/virología , Algoritmos , Bioestadística/métodos , Simulación por Computador , VIH-1 , Humanos , Funciones de Verosimilitud , Límite de Detección , Estudios Longitudinales , Análisis Multivariante , ARN Viral/sangre , Carga Viral/estadística & datos numéricosRESUMEN
A Pesquisa de Orçamentos Familiares 2008-2009, feita por amostragem em nível nacional, coletou informações antropométricas de peso e estatura dos indivíduos no Brasil. Numa pesquisa desse porte, o processo de coleta produz dados que estão sujeitos a contaminações por erros de medição e de não resposta. Tais erros podem afetar os cálculos de indicadores de prevalência de desnutrição, sobrepeso ou obesidade e impactar de forma distinta em diferentes segmentos populacionais. No presente artigo, comparou-se o desempenho do método CIDAQ, que foi empregado na POF 2008-2009, para tratar os dados antropométricos, ao de outros dois métodos: os algoritmos de detecção de outliers TRC e Bacon, ambos associados ao algoritmo de imputação Poem. Essa comparação é fundamental para assegurar que o melhor método seja utilizado em pesquisas futuras, buscando assegurar a confiabilidade dos dados para os estudos que subsidiam o planejamento de políticas públicas nas áreas de saúde, nutrição, assistência social e outras. Os métodos foram comparados via simulação, considerando o impacto sobre as estimativas de média, desvio padrão e correlação entre peso e estatura. O método CIDAQ apresentou uma pequena vantagem sobre os demais nos resultados da simulação paramétrica, enquanto para simulação não paramétrica destacou-se o método Bacon...
The Household Budget Survey 2008-2009 is a nationwide sample survey, conducted by IBGE, which collects anthropometric data on height and weight that are important to assess the nutritional status of individuals in Brazil. Due to the difficulties in collecting this type of information by a large and nationwide research as the HBS 2008-2009, which use of portable equipment for measuring, the collected data are subject to contamination by non-sampling errors and non-response. These errors may compromise analysis about the nutritional status of the population in order to support the planning and implementation of public policies in the areas of health, nutrition, social assistance and other. Particularly, such errors can affect the malnutrition, overweight and obese prevalence indicators and produce effects differently in different population segments. In this survey (HBS 2008-2009) the methodology employed to tackle these problems and preserve the quality of the data was the CIDAQ. In this study this approach was compared with two other approaches for multivariate quantitative data, namely the TRC algorithm and the BACON algorithm for editing, both coupled with the POEM imputation algorithm. These compare is essential to ensure which one is the best method to be used in future research to repeat the situation experienced in HBS 2008-2009. The three approaches were compared by simulation of the anthropometric variables weight and height of a HBS 2008-2009 data subset...
Resumen La Encuesta de Presupuestos Familiares 2008-2009 es una encuesta por muestreo a nivel nacional, realizada por el IBGE, que contempla los datos antropométricos de peso y talla, importante para la evaluación del estado nutricional de las personas en Brasil. Debido a las dificultades para recoger este tipo de información de una extensa encuesta como el EPF 2008-2009, en particular la necesidad de que el uso de equipo portátil para el proceso de medición, los datos en las encuestas de este tipo están sometidas a la contaminación por los errores ajenos al muestreo y la falta de respuesta. Este tipo de errores pueden poner en peligro el análisis del estado nutricional de la población con la finalidad de subvencionar la planificación e implementación de políticas públicas en los âmbitos de la salud, la nutrición, la asistencia social y otra. En particular, este tipo de errores pueden afectar los indicadores de prevalencia de desnutrición, sobrepeso u obesidad y actuar de manera diferente en diferentes segmentos de la población . En el encuesta, se emplea el método de CIDAQ para tratar los datos antropométricos recolectados. Este estudio comparó el rendimiento de este método a los otros dos métodos aplicados a datos cuantitativos multivariante, el algoritmo TRC y el algoritmo BACON para detectar valores atípicos, ambos asociados con el algoritmo POEM de imputación. Esta comparación es esencial para asegurar que el mejor método puede ser utilizado en futuras investigaciones para repetir la situación que se vive en la EPF 2008-2009. Los métodos fueron comparados a través de la simulación de las variables antropométricas de peso y la altura de un subconjunto de datos de la Encuesta de Presupuestos Familiares 2008-2009...
Asunto(s)
Humanos , Masculino , Femenino , Niño , Adolescente , Adulto Joven , Persona de Mediana Edad , Antropometría/métodos , Estado Nutricional , Índice de Masa Corporal , Brasil , Interpretación Estadística de Datos , Estadísticas no Paramétricas , Peso por EstaturaRESUMEN
BACKGROUND: Neuroimaging techniques combined with computational neuroanatomy have been playing a role in the investigation of healthy aging and Alzheimer's disease (AD). The definition of normative rules for brain features is a crucial step to establish typical and atypical aging trajectories. OBJECTIVE: To introduce an unsupervised pattern recognition method; to define multivariate normative rules of neuroanatomical measures; and to propose a brain abnormality index. METHODS: This study was based on a machine learning approach (one class classification or novelty detection) to neuroanatomical measures (brain regions, volume, and cortical thickness) extracted from the Alzheimer's Disease Neuroimaging Initiative (ADNI)'s database. We applied a ν-One-Class Support Vector Machine (ν-OC-SVM) trained with data from healthy subjects to build an abnormality index, which was compared with subjects diagnosed with mild cognitive impairment and AD. RESULTS: The method was able to classify AD subjects as outliers with an accuracy of 84.3% at a false alarm rate of 32.5%. The proposed brain abnormality index was found to be significantly associated with group diagnosis, clinical data, biomarkers, and future conversion to AD. CONCLUSION: These results suggest that one-class classification may be a promising approach to help in the detection of disease conditions. Our findings support a framework considering the continuum of brain abnormalities from healthy aging to AD, which is correlated with cognitive impairment and biomarkers measurements.
Asunto(s)
Envejecimiento/patología , Encéfalo/patología , Interpretación de Imagen Asistida por Computador/métodos , Imagen por Resonancia Magnética/métodos , Máquina de Vectores de Soporte , Anciano , Anciano de 80 o más Años , Enfermedad de Alzheimer/clasificación , Enfermedad de Alzheimer/patología , Disfunción Cognitiva/clasificación , Disfunción Cognitiva/patología , Bases de Datos Factuales , Femenino , Humanos , Masculino , Persona de Mediana Edad , Análisis Multivariante , Tamaño de los Órganos , Reconocimiento de Normas Patrones Automatizadas/métodos , Sensibilidad y Especificidad , Aprendizaje Automático no SupervisadoRESUMEN
Continuous (clustered) proportion data often arise in various domains of medicine and public health where the response variable of interest is a proportion (or percentage) quantifying disease status for the cluster units, ranging between zero and one. However, because of the presence of relatively disease-free as well as heavily diseased subjects in any study, the proportion values can lie in the interval [0,1]. While beta regression can be adapted to assess covariate effects in these situations, its versatility is often challenged because of the presence/excess of zeros and ones because the beta support lies in the interval (0,1). To circumvent this, we augment the probabilities of zero and one with the beta density, controlling for the clustering effect. Our approach is Bayesian with the ability to borrow information across various stages of the complex model hierarchy and produces a computationally convenient framework amenable to available freeware. The marginal likelihood is tractable and can be used to develop Bayesian case-deletion influence diagnostics based on q-divergence measures. Both simulation studies and application to a real dataset from a clinical periodontology study quantify the gain in model fit and parameter estimation over other ad hoc alternatives and provide quantitative insight into assessing the true covariate effects on the proportion responses.
Asunto(s)
Teorema de Bayes , Análisis por Conglomerados , Funciones de Verosimilitud , Análisis de Regresión , Adulto , Anciano , Anciano de 80 o más Años , Simulación por Computador , Femenino , Humanos , Masculino , Persona de Mediana Edad , Enfermedades Periodontales/epidemiologíaRESUMEN
A common assumption in nonlinear mixed-effects models is the normality of both random effects and within-subject errors. However, such assumptions make inferences vulnerable to the presence of outliers. More flexible distributions are therefore necessary for modeling both sources of variability in this class of models. In the present paper, I consider an extension of the nonlinear mixed-effects models in which random effects and within-subject errors are assumed to be distributed according to a rich class of parametric models that are often used for robust inference. The class of distributions I consider is the scale mixture of multivariate normal distributions that consist of a wide range of symmetric and continuous distributions. This class includes heavy-tailed multivariate distributions, such as the Student's t and slash and contaminated normal. With the scale mixture of multivariate normal distributions, robustification is achieved from the tail behavior of the different distributions. A Bayesian framework is adopted, and MCMC is used to carry out posterior analysis. Model comparison using different criteria was considered. The procedures are illustrated using a real dataset from a pharmacokinetic study. I contrast results from the normal and robust models and show how the implementation can be used to detect outliers.
Asunto(s)
Teorema de Bayes , Dinámicas no Lineales , Humanos , Funciones de Verosimilitud , Distribución Normal , Teofilina/farmacocinéticaRESUMEN
O método da análise robusta de variância proposto por Bertaccini e Varriale (2006) permite monitorar o efeito de outliers no processo de modelagem estatística. Para isso, utiliza-se a formação de subconjuntos, nos quais as unidades amostrais são alocadas, baseando-se em apenas uma inspeção dos dados. Com o propósito de estender este método para o modelo Poisson, o presente trabalho propõe monitorar o efeito de outliers no número de casos de AIDS diagnosticados no Brasil no período de 2003 a 2006. A metodologia proposta foi viável e é recomendável para dados de contagem, sendo, portanto, uma importante técnica de análise de dados para identificação de outliers em amostras, podendo ser aplicado a outros modelos generalizados com as devidas modificações na obtenção dos resíduos. .
The robust analysis variance method proposed by Bertaccini and Varriale (2006) allows the monitoring of the outliers effect in the statistic modeling process. As for that, the formation of subsets is used, where the sample units are allocated, based only on data inspection. With the purpose of extending this method to the Poisson model, this work intends to monitor the effect of outliers in the number of AIDS cases diagnosed in Brazil, from 2003 to 2006. The methodology proposed was viable and is recommended for counting data, being, therefore, an important data analysis technique used to identify outliers in samples. It can also be applied to other generalized models with appropriate changes in obtaining residuals..