RESUMO
Acute physical activity leads to several changes in metabolic, cardiovascular, and immune pathways. Although studies have examined selected changes in these pathways, the system-wide molecular response to an acute bout of exercise has not been fully characterized. We performed longitudinal multi-omic profiling of plasma and peripheral blood mononuclear cells including metabolome, lipidome, immunome, proteome, and transcriptome from 36 well-characterized volunteers, before and after a controlled bout of symptom-limited exercise. Time-series analysis revealed thousands of molecular changes and an orchestrated choreography of biological processes involving energy metabolism, oxidative stress, inflammation, tissue repair, and growth factor response, as well as regulatory pathways. Most of these processes were dampened and some were reversed in insulin-resistant participants. Finally, we discovered biological pathways involved in cardiopulmonary exercise response and developed prediction models revealing potential resting blood-based biomarkers of peak oxygen consumption.
Assuntos
Metabolismo Energético/fisiologia , Exercício Físico/fisiologia , Idoso , Biomarcadores/metabolismo , Feminino , Humanos , Insulina/metabolismo , Resistência à Insulina , Leucócitos Mononucleares/metabolismo , Estudos Longitudinais , Masculino , Metaboloma , Pessoa de Meia-Idade , Oxigênio/metabolismo , Consumo de Oxigênio , Proteoma , TranscriptomaRESUMO
Gene misexpression is the aberrant transcription of a gene in a context where it is usually inactive. Despite its known pathological consequences in specific rare diseases, we have a limited understanding of its wider prevalence and mechanisms in humans. To address this, we analyzed gene misexpression in 4,568 whole-blood bulk RNA sequencing samples from INTERVAL study blood donors. We found that while individual misexpression events occur rarely, in aggregate they were found in almost all samples and a third of inactive protein-coding genes. Using 2,821 paired whole-genome and RNA sequencing samples, we identified that misexpression events are enriched in cis for rare structural variants. We established putative mechanisms through which a subset of SVs lead to gene misexpression, including transcriptional readthrough, transcript fusions, and gene inversion. Overall, we develop misexpression as a type of transcriptomic outlier analysis and extend our understanding of the variety of mechanisms by which genetic variants can influence gene expression.
Assuntos
Regulação da Expressão Gênica , Humanos , Análise de Sequência de RNA , Variação Genética , Variação Estrutural do Genoma/genética , Transcriptoma/genética , Doadores de SangueRESUMO
Detection of aberrantly spliced genes is an important step in RNA-seq-based rare-disease diagnostics. We recently developed FRASER, a denoising autoencoder-based method that outperformed alternative methods of detecting aberrant splicing. However, because FRASER's three splice metrics are partially redundant and tend to be sensitive to sequencing depth, we introduce here a more robust intron-excision metric, the intron Jaccard index, that combines the alternative donor, alternative acceptor, and intron-retention signal into a single value. Moreover, we optimized model parameters and filter cutoffs by using candidate rare-splice-disrupting variants as independent evidence. On 16,213 GTEx samples, our improved algorithm, FRASER 2.0, called typically 10 times fewer splicing outliers while increasing the proportion of candidate rare-splice-disrupting variants by 10-fold and substantially decreasing the effect of sequencing depth on the number of reported outliers. To lower the multiple-testing correction burden, we introduce an option to select the genes to be tested for each sample instead of a transcriptome-wide approach. This option can be particularly useful when prior information, such as candidate variants or genes, is available. Application on 303 rare-disease samples confirmed the relative reduction in the number of outlier calls for a slight loss of sensitivity; FRASER 2.0 recovered 22 out of 26 previously identified pathogenic splicing cases with default cutoffs and 24 when multiple-testing correction was limited to OMIM genes containing rare variants. Altogether, these methodological improvements contribute to more effective RNA-seq-based rare diagnostics by drastically reducing the amount of splicing outlier calls per sample at minimal loss of sensitivity.
Assuntos
Processamento Alternativo , Splicing de RNA , Humanos , Processamento Alternativo/genética , Íntrons/genética , Splicing de RNA/genética , RNA-Seq , AlgoritmosRESUMO
Quality control in quantitative proteomics is a persistent challenge, particularly in identifying and managing outliers. Unsupervised learning models, which rely on data structure rather than predefined labels, offer potential solutions. However, without clear labels, their effectiveness might be compromised. Single models are susceptible to the randomness of parameters and initialization, which can result in a high rate of false positives. Ensemble models, on the other hand, have shown capabilities in effectively mitigating the impacts of such randomness and assisting in accurately detecting true outliers. Therefore, we introduced SEAOP, a Python toolbox that utilizes an ensemble mechanism by integrating multi-round data management and a statistics-based decision pipeline with multiple models. Specifically, SEAOP uses multi-round resampling to create diverse sub-data spaces and employs outlier detection methods to identify candidate outliers in each space. Candidates are then aggregated as confirmed outliers via a chi-square test, adhering to a 95% confidence level, to ensure the precision of the unsupervised approaches. Additionally, SEAOP introduces a visualization strategy, specifically designed to intuitively and effectively display the distribution of both outlier and non-outlier samples. Optimal hyperparameter models of SEAOP for outlier detection were identified by using a gradient-simulated standard dataset and Mann-Kendall trend test. The performance of the SEAOP toolbox was evaluated using three experimental datasets, confirming its reliability and accuracy in handling quantitative proteomics.
Assuntos
Gerenciamento de Dados , Proteômica , Reprodutibilidade dos Testes , Controle de Qualidade , Interpretação Estatística de DadosRESUMO
Protein structure, both at the global and local level, dictates function. Proteins fold from chains of amino acids, forming secondary structures, α-helices and ß-strands, that, at least for globular proteins, subsequently fold into a three-dimensional structure. Here, we show that a Ramachandran-type plot focusing on the two dihedral angles separated by the peptide bond, and entirely contained within an amino acid pair, defines a local structural unit. We further demonstrate the usefulness of this cross-peptide-bond Ramachandran plot by showing that it captures ß-turn conformations in coil regions, that traditional Ramachandran plot outliers fall into occupied regions of our plot, and that thermophilic proteins prefer specific amino acid pair conformations. Further, we demonstrate experimentally that the effect of a point mutation on backbone conformation and protein stability depends on the amino acid pair context, i.e., the identity of the adjacent amino acid, in a manner predictable by our method.
Assuntos
Aminoácidos , Proteínas , Aminoácidos/química , Proteínas/genética , Proteínas/química , Estrutura Secundária de Proteína , Conformação Proteica em alfa-Hélice , Peptídeos/química , Conformação ProteicaRESUMO
Entropic outlier sparsification (EOS) is proposed as a cheap and robust computational strategy for learning in the presence of data anomalies and outliers. EOS dwells on the derived analytic solution of the (weighted) expected loss minimization problem subject to Shannon entropy regularization. An identified closed-form solution is proven to impose additional costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically symmetric Gaussians-used heuristically in many popular data analysis algorithms-represent an optimal and least-biased choice for the nonparametric probability distributions when working with squared Euclidean distances. The performance of EOS is compared to a range of commonly used tools on synthetic problems and on partially mislabeled supervised classification problems from biomedicine. Applying EOS for coinference of data anomalies during learning is shown to allow reaching an accuracy of [Formula: see text] when predicting patient mortality after heart failure, statistically significantly outperforming predictive performance of common learning tools for the same data.
RESUMO
The edgeR (Robust) is a popular approach for identifying differentially expressed genes (DEGs) from RNA-Seq profiles. However, it shows weak performance against gene-specific outliers and is unable to handle missing observations. To address these issues, we proposed a pre-processing approach of RNA-Seq count data by combining the iLOO-based outlier detection and random forest-based missing imputation approach for boosting the performance of edgeR (Robust). Both simulation and real RNA-Seq count data analysis results showed that the proposed edgeR (Robust) outperformed than the conventional edgeR (Robust). To investigate the effectiveness of identified DEGs for diagnosis, and therapies of ovarian cancer (OC), we selected top-ranked 12 DEGs (IL6, XCL1, CXCL8, C1QC, C1QB, SNAI2, TYROBP, COL1A2, SNAP25, NTS, CXCL2, and AGT) and suggested hub-DEGs guided top-ranked 10 candidate drug-molecules for the treatment against OC. Hence, our proposed procedure might be an effective computational tool for exploring potential DEGs from RNA-Seq profiles for diagnosis and therapies of any disease.
Assuntos
Biomarcadores Tumorais , Neoplasias Ovarianas , RNA-Seq , Humanos , Neoplasias Ovarianas/genética , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/terapia , Feminino , Biomarcadores Tumorais/genética , Software , Transcriptoma , Perfilação da Expressão GênicaRESUMO
The identification of proteoforms by top-down proteomics requires both high quality fragmentation spectra and the neutral mass of the proteoform from which the fragments derive. Intact proteoform spectra can be highly complex and may include multiple overlapping proteoforms, as well as many isotopic peaks and charge states. The resulting lower signal-to-noise ratios for intact proteins complicates downstream analyses such as deconvolution. Averaging multiple scans is a common way to improve signal-to-noise, but mass spectrometry data contains artifacts unique to it that can degrade the quality of an averaged spectra. To overcome these limitations and increase signal-to-noise, we have implemented outlier rejection algorithms to remove outlier measurements efficiently and robustly in a set of MS1 scans prior to averaging. We have implemented averaging with rejection algorithms in the open-source, freely available, proteomics search engine MetaMorpheus. Herein, we report the application of the averaging with rejection algorithms to direct injection and online liquid chromatography mass spectrometry data. Averaging with rejection algorithms demonstrated a 45% increase in the number of proteoforms detected in Jurkat T cell lysate. We show that the increase is due to improved spectral quality, particularly in regions surrounding isotopic envelopes.
Assuntos
Proteoma , Proteômica , Proteoma/análise , Proteômica/métodos , Processamento de Proteína Pós-Traducional , Algoritmos , Espectrometria de MassasRESUMO
This paper (i) explores the internal structure of two quantum mechanics datasets (QM7b, QM9), composed of several thousands of organic molecules and described in terms of electronic properties, and (ii) further explores an inverse design approach to molecular design consisting of using machine learning methods to approximate the atomic composition of molecules, using QM9 data. Understanding the structure and characteristics of this kind of data is important when predicting the atomic composition from physical-chemical properties in inverse molecular designs. Intrinsic dimension analysis, clustering, and outlier detection methods were used in the study. They revealed that for both datasets the intrinsic dimensionality is several times smaller than the descriptive dimensions. The QM7b data is composed of well-defined clusters related to atomic composition. The QM9 data consists of an outer region predominantly composed of outliers, and an inner, core region that concentrates clustered inliner objects. A significant relationship exists between the number of atoms in the molecule and its outlier/inliner nature. The spatial structure exhibits a relationship with molecular weight. Despite the structural differences between the two datasets, the predictability of variables of interest for inverse molecular design is high. This is exemplified by models estimating the number of atoms of the molecule from both the original properties and from lower dimensional embedding spaces. In the generative approach the input is given by a set of desired properties of the molecule and the output is an approximation of the atomic composition in terms of its constituent chemical elements. This could serve as the starting region for further search in the huge space determined by the set of possible chemical compounds. The quantum mechanic's dataset QM9 is used in the study, composed of 133,885 small organic molecules and 19 electronic properties. Different multi-target regression approaches were considered for predicting the atomic composition from the properties, including feature engineering techniques in an auto-machine learning framework. High-quality models were found that predict the atomic composition of the molecules from their electronic properties, as well as from a subset of only 52.6% size. Feature selection worked better than feature generation. The results validate the generative approach to inverse molecular design.
RESUMO
PURPOSE: To present and assess an outlier mitigation method that makes free-running volumetric cardiovascular MRI (CMR) more robust to motion. METHODS: The proposed method, called compressive recovery with outlier rejection (CORe), models outliers in the measured data as an additive auxiliary variable. We enforce MR physics-guided group sparsity on the auxiliary variable, and jointly estimate it along with the image using an iterative algorithm. For evaluation, CORe is first compared to traditional compressed sensing (CS), robust regression (RR), and an existing outlier rejection method using two simulation studies. Then, CORe is compared to CS using seven three-dimensional (3D) cine, 12 rest four-dimensional (4D) flow, and eight stress 4D flow imaging datasets. RESULTS: Our simulation studies show that CORe outperforms CS, RR, and the existing outlier rejection method in terms of normalized mean square error and structural similarity index across 55 different realizations. The expert reader evaluation of 3D cine images demonstrates that CORe is more effective in suppressing artifacts while maintaining or improving image sharpness. Finally, 4D flow images show that CORe yields more reliable and consistent flow measurements, especially in the presence of involuntary subject motion or exercise stress. CONCLUSION: An outlier rejection method is presented and tested using simulated and measured data. This method can help suppress motion artifacts in a wide range of free-running CMR applications.
Assuntos
Algoritmos , Imageamento Tridimensional , Imagem Cinética por Ressonância Magnética , Humanos , Imageamento Tridimensional/métodos , Imagem Cinética por Ressonância Magnética/métodos , Artefatos , Simulação por Computador , Movimento (Física) , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reprodutibilidade dos Testes , Coração/diagnóstico por imagemRESUMO
Senescence is an irreversible arrest of the cell cycle that can be characterized by markers of senescence such as p16, p21, and KI-67. The characterization of different senescence-associated phenotypes requires selection of the most relevant senescence markers to define reliable cytometric methodologies. Mass cytometry (a.k.a. Cytometry by time of flight, CyTOF) can monitor up to 40 different cell markers at the single-cell level and has the potential to integrate multiple senescence and other phenotypic markers to identify senescent cells within a complex tissue such as skeletal muscle, with greater accuracy and scalability than traditional bulk measurements and flow cytometry-based measurements. This article introduces an analysis framework for detecting putative senescent cells based on clustering, outlier detection, and Boolean logic for outliers. Results show that the pipeline can identify putative senescent cells in skeletal muscle with well-established markers such as p21 and potential markers such as GAPDH. It was also found that heterogeneity of putative senescent cells in skeletal muscle can partly be explained by their cell type. Additionally, autophagy-related proteins ATG4A, LRRK2, and GLB1 were identified as important proteins in predicting the putative senescent population, providing insights into the association between autophagy and senescence. It was observed that sex did not affect the proportion of putative senescent cells among total cells. However, age did have an effect, with a higher proportion observed in fibro/adipogenic progenitors (FAPs), satellite cells, M1 and M2 macrophages from old mice. Moreover, putative senescent cells from muscle of old and young mice show different expression levels of senescence-related proteins, with putative senescent cells of old mice having higher levels of p21 and GAPDH, whereas putative senescent cells of young mice had higher levels of IL-6. Overall, the analysis framework prioritizes multiple senescence-associated proteins to characterize putative senescent cells sourced from tissue made of different cell types.
Assuntos
Biomarcadores , Senescência Celular , Citometria de Fluxo , Músculo Esquelético , Animais , Senescência Celular/fisiologia , Camundongos , Músculo Esquelético/citologia , Músculo Esquelético/metabolismo , Citometria de Fluxo/métodos , Biomarcadores/metabolismo , Feminino , Masculino , Camundongos Endogâmicos C57BL , Inibidor de Quinase Dependente de Ciclina p21/metabolismo , Análise de Célula Única/métodosRESUMO
Plant pathogens are constantly under selection pressure for host resistance adaptation. Soybean cyst nematode (SCN, Heterodera glycines) is a major pest of soybean primarily managed through resistant cultivars; however, SCN populations have evolved virulence in response to selection pressures driven by repeated monoculture of the same genetic resistance. Resistance to SCN is mediated by multiple epistatic interactions between Rhg (for resistance to H. glycines) genes. However, the identity of SCN virulence genes that confer the ability to overcome resistance remains unknown. To identify candidate genomic regions showing signatures of selection for increased virulence, we conducted whole genome resequencing of pooled individuals (Pool-Seq) from two pairs of SCN populations adapted on soybeans with Peking-type (rhg1-a, rhg2, and Rhg4) resistance. Population differentiation and principal component analysis-based approaches identified approximately 0.72-0.79 million SNPs, the frequency of which showed potential selection signatures across multiple genomic regions. Chromosomes 3 and 6 between population pairs showed the greatest density of outlier SNPs with high population differentiation. Conducting multiple outlier detection tests to identify overlapping SNPs resulted in a total of 966 significantly differentiated SNPs, of which 285 exon SNPs were mapped to 97 genes. Of these, six genes encoded members of known stylet-secreted effector protein families potentially involved in host defence modulation including venom-allergen-like, annexin, glutathione synthetase, SPRYSEC, chitinase, and CLE effector proteins. Further functional analysis of identified candidate genes will provide new insights into the genetic mechanisms by which SCN overcomes soybean resistance and inform the development of molecular markers for rapidly screening the virulence profile of an SCN-infested field.
Assuntos
Resistência à Doença , Glycine max , Doenças das Plantas , Polimorfismo de Nucleotídeo Único , Tylenchoidea , Animais , Glycine max/genética , Glycine max/parasitologia , Polimorfismo de Nucleotídeo Único/genética , Virulência/genética , Doenças das Plantas/parasitologia , Doenças das Plantas/genética , Resistência à Doença/genética , Tylenchoidea/genética , Tylenchoidea/patogenicidade , Seleção Genética , Genética Populacional , Sequenciamento Completo do GenomaRESUMO
STUDY QUESTION: Can we monitor post-oocyte retrieval infections in the French national health data system to complement the French ART vigilance system? SUMMARY ANSWER: Medico-administrative databases provide a more comprehensive view of post-oocyte retrieval infections and can be used to detect abnormal increases in frequency and outlier ART centers as a complementary tool to the ART vigilance system. WHAT IS KNOWN ALREADY: The various studies of ART complications are reassuring, showing relatively low overall complication rates. Nonetheless, the European Union has set up a vigilance system to monitor these complications. However, this system is not an exhaustive source of information and does not provide a complete overview of post-ART complications. STUDY DESIGN, SIZE, DURATION: The study population was identified from the comprehensive French national hospital discharge database. It included women under 46 years of age undergoing an oocyte retrieval in 2019, classified into three population subgroups according to the indication of oocyte retrieval: infertility (IF) , fertility preservation (FP), and oocyte donation (OD) . The study population included 52â098 women who had undergone 65â948 oocyte retrievals in 2019. PARTICIPANTS/MATERIALS, SETTING, METHODS: Hospital stays and delivery of antibiotics within 31 days after oocyte retrieval were analyzed. Women and infections were characterized according to various characteristics (age, comorbidities, indication of oocyte retrieval, type of hospital stay, length of hospital stay, type of antibiotherapy, etc.). Multivariate analysis was performed to determine the relation between the occurrence of infection and women's characteristics, and results are expressed as odds ratios (ORs) and 95% CI. A funnel plot and a box plot were used to compare the infection rate per center with the national average and to detect outliers. MAIN RESULTS AND THE ROLE OF CHANCE: Infections in the month following the oocyte retrieval represented 6.9% of the procedures in 2019 (n = 4522). Of these infections, 112 were hospitalized (0.2% of oocyte retrievals), and 4410 were non-hospitalized (6.7% of oocyte retrievals). The hospitalized infections were essentially gynecological infections (40.9%) and urinary tract infections (23.5%). In 87.9% of non-hospitalized infections, a single antibiotic therapy was prescribed. Mixed-effect model analysis showed that the risk of infection was significantly higher in women under 30 years of age, in the FP population, in supplementary universal health coverage (CMU-C) beneficiaries, and women with endometriosis. Funnel plot and box plot analysis showed that three ART centers have an infection rate significantly higher than the national average. In the three centers that stand out from all the others, the objective is to return to these centers to understand the possible reasons for this observed rate and to implement corrective measures. LIMITATIONS, REASONS FOR CAUTION: Despite all its advantages, the French national health data system presents some limitations, such as the risk of inappropriate coding. Another limitation of this study is that we cannot confirm an attributable relation between the infection and the ART procedure, even if the delay of 31 days after oocyte retrieval is consistent with the occurrence of a post-retrieval complication. In addition, antibiotics may be prescribed as a 'precautionary' measure in certain situations (women with a susceptibility to infection, complicated procedures), or as antibiotic prophylaxis for embryo transfer. WIDER IMPLICATIONS OF THE FINDINGS: Despite the limits in identifying post-ART infections in medico-administrative databases, this approach is a promising way to complement the ART vigilance reporting system. This concept developed for infections will also be generalized to other complications with regular feedback to professionals. STUDY FUNDING/COMPETING INTEREST(S): No specific funding was sought for the study. The study was supported by the Agence de la biomédecine, France. The authors declare that they have no conflict of interest. TRIAL REGISTRATION NUMBER: N/A.
Assuntos
Infertilidade , Recuperação de Oócitos , Feminino , Humanos , Gravidez , Antibacterianos/uso terapêutico , Transferência Embrionária , Fertilização in vitro/métodos , Infertilidade/terapia , Recuperação de Oócitos/efeitos adversos , Recuperação de Oócitos/métodos , Taxa de Gravidez , Estudos RetrospectivosRESUMO
BACKGROUND: Early diagnosis and prompt treatment of malaria in young children are crucial for preventing the serious stages of the disease. If delayed treatment-seeking habits are observed in certain areas, targeted campaigns and interventions can be implemented to improve the situation. METHODS: This study applied multivariate binary logistic regression model diagnostics and geospatial logistic model to identify traditional authorities in Malawi where caregivers have unusual health-seeking behaviour for childhood malaria. The data from the 2021 Malawi Malaria Indicator Survey were analysed using R software version 4.3.0 for regressions and STATA version 17 for data cleaning. RESULTS: Both models showed significant variability in treatment-seeking habits of caregivers between villages. The mixed-effects logit model residual identified Vuso Jere, Kampingo Sibande, Ngabu, and Dzoole as outliers in the model. Despite characteristics that promote late reporting of malaria at clinics, most mothers in these traditional authorities sought treatment within twenty-four hours of the onset of malaria symptoms in their children. On the other hand, the geospatial logit model showed that late seeking of malaria treatment was prevalent in most areas of the country, except a few traditional authorities such as Mwakaboko, Mwenemisuku, Mwabulambya, Mmbelwa, Mwadzama, Zulu, Amidu, Kasisi, and Mabuka. CONCLUSIONS: These findings suggest that using a combination of multivariate regression model residuals and geospatial statistics can help in identifying communities with distinct treatment-seeking patterns for childhood malaria within a population. Health policymakers could benefit from consulting traditional authorities who demonstrated early reporting for care in this study. This could help in understanding the best practices followed by mothers in those areas which can be replicated in regions where seeking care is delayed.
Assuntos
Malária , Aceitação pelo Paciente de Cuidados de Saúde , Malaui , Humanos , Malária/prevenção & controle , Malária/epidemiologia , Aceitação pelo Paciente de Cuidados de Saúde/estatística & dados numéricos , Pré-Escolar , Modelos Logísticos , Lactente , Feminino , Masculino , Adulto , Criança , Adulto Jovem , AdolescenteRESUMO
Meta-analysis is a widely used tool for synthesizing results from multiple studies. The collected studies are deemed heterogeneous when they do not share a common underlying effect size; thus, the factors attributable to the heterogeneity need to be carefully considered. A critical problem in meta-analyses and systematic reviews is that outlying studies are frequently included, which can lead to invalid conclusions and affect the robustness of decision-making. Outliers may be caused by several factors such as study selection criteria, low study quality, small-study effects, and so on. Although outlier detection is well-studied in the statistical community, limited attention has been paid to meta-analysis. The conventional outlier detection method in meta-analysis is based on a leave-one-study-out procedure. However, when calculating a potentially outlying study's deviation, other outliers could substantially impact its result. This article proposes an iterative method to detect potential outliers, which reduces such an impact that could confound the detection. Furthermore, we adopt bagging to provide valid inference for sensitivity analyses of excluding outliers. Based on simulation studies, the proposed iterative method yields smaller bias and heterogeneity after performing a sensitivity analysis to remove the identified outliers. It also provides higher accuracy on outlier detection. Two case studies are used to illustrate the proposed method's real-world performance.
Assuntos
Metanálise como Assunto , Revisões Sistemáticas como Assunto , Humanos , Viés , Simulação por ComputadorRESUMO
BACKGROUND: Outliers, data points that significantly deviate from the norm, can have a substantial impact on statistical inference and provide valuable insights in data analysis. Multiple methods have been developed for outlier detection, however, almost all available approaches fail to consider the spatial dependence and heterogeneity in spatial data. Spatial data has diverse formats and semantics, requiring specialized outlier detection methodology to handle these unique properties. For now, there is limited research exists on robust spatial outlier detection methods designed specifically under the spatial error model (SEM) structure. METHOD: We propose the Spatial-Θ-Iterative Procedure for Outlier Detection (Spatial-Θ-IPOD), which utilizes a mean-shift vector to identify outliers within the SEM. Our method enables an effective detection of spatial outliers while also providing robust coefficient estimates. To assess the performance of our approach, we conducted extensive simulations and applied it to a real-world empirical study using life expectancy data from multiple countries. RESULTS: Simulation results showed that the masking and JD (Joint Detection) indicators of our Spatial-Θ-IPOD method outperformed several commonly used methods, even in high-dimensional scenarios, demonstrating stable performance. Conversely, the Θ-IPOD method proved to be ineffective in detecting outliers when spatial correlation was present. Moreover, our model successfully provided reliable coefficient estimation alongside outlier detection. The proposed method consistently outperformed other models (both robust and non-robust) in most cases. In the empirical study, our proposed model successfully detected outliers and provided valuable insights in the modeling process. CONCLUSIONS: Our proposed Spatial-Θ-IPOD offers an effective solution for detecting spatial outliers for SEM while providing robust coefficient estimates. Notably, our approach showcases its relative superiority even in the presence of high leverage points. By successfully identifying outliers, our method enhances the overall understanding of the data and provides valuable insights for further analysis.
RESUMO
BACKGROUND: Several strategies for identifying biologically implausible values in longitudinal anthropometric data have recently been proposed, but the suitability of these strategies for large population datasets needs to be better understood. This study evaluated the impact of removing population outliers and the additional value of identifying and removing longitudinal outliers on the trajectories of length/height and weight and on the prevalence of child growth indicators in a large longitudinal dataset of child growth data. METHODS: Length/height and weight measurements of children aged 0 to 59 months from the Brazilian Food and Nutrition Surveillance System were analyzed. Population outliers were identified using z-scores from the World Health Organization (WHO) growth charts. After identifying and removing population outliers, residuals from linear mixed-effects models were used to flag longitudinal outliers. The following cutoffs for residuals were tested to flag those: -3/+3, -4/+4, -5/+5, -6/+6. The selected child growth indicators included length/height-for-age z-scores and weight-for-age z-scores, classified according to the WHO charts. RESULTS: The dataset included 50,154,738 records from 10,775,496 children. Boys and girls had 5.74% and 5.31% of length/height and 5.19% and 4.74% of weight values flagged as population outliers, respectively. After removing those, the percentage of longitudinal outliers varied from 0.02% (<-6/>+6) to 1.47% (<-3/>+3) for length/height and from 0.07 to 1.44% for weight in boys. In girls, the percentage of longitudinal outliers varied from 0.01 to 1.50% for length/height and from 0.08 to 1.45% for weight. The initial removal of population outliers played the most substantial role in the growth trajectories as it was the first step in the cleaning process, while the additional removal of longitudinal outliers had lower influence on those, regardless of the cutoff adopted. The prevalence of the selected indicators were also affected by both population and longitudinal (to a lesser extent) outliers. CONCLUSIONS: Although both population and longitudinal outliers can detect biologically implausible values in child growth data, removing population outliers seemed more relevant in this large administrative dataset, especially in calculating summary statistics. However, both types of outliers need to be identified and removed for the proper evaluation of trajectories.
Assuntos
Estatura , Gráficos de Crescimento , Criança , Masculino , Feminino , Humanos , Peso Corporal , Brasil/epidemiologia , AntropometriaRESUMO
BACKGROUND: Women's levels of education and fertility are commonly associated. In Sub-Saharan Africa, the pace of decreasing fertility rates varies greatly, and this is linked to women's levels of education. However, this association may be influenced by unusual females who have uncommon measurements on both variables. Despite this, most studies that researched this association have only analysed the data descriptively, without taking into account the effect of potential outliers. This study aimed to examine the presence and impact of outlier women on the relationship between female education and fertility in Malawi, using regression methods. METHODS: To analyse the correlation between women's schooling and fertility and evaluate the effect of outliers on this relationship, a bivariate Poisson model was applied to three recent demographic and health surveys in Malawi. The R software version 4.3.0 was used for model fitting, outlier computations, and correlation analysis. The STATA version 12.0 was used for data cleaning. RESULTS: The findings revealed a correlation of -0.68 to -0.61 between schooling and fertility over 15 years in Malawi. A few outlier women were identified, most of whom had either attended 0 or at least 9 years of schooling and had born either 0 or at least 5 children. The majority of the outliers were non-users of modern contraceptive methods and worked as domestic workers or were unemployed. Removing the outliers from the analysis led to marked changes in the fixed effects sizes and slight shifts in correlation, but not in the direction and significance of the estimates. The woman's marital status, occupation, household wealth, age at first sex, and usage of modern contraceptives exhibited significant effects on education and fertility outcomes. CONCLUSION: There is a high negative correlation between female schooling and fertility in Malawi. Some outlier women were identified, they had either attended zero or at least nine years of schooling and had either born zero or at least five children. Most of them were non-users of modern contraceptives and domestic workers. Their impact on regression estimates was substantial, but minimal on correlation. Their identification highlights the need for policymakers to reconsider implementation strategies for modern contraceptive methods to make them more effective.
Assuntos
Anticoncepcionais , Fertilidade , Criança , Feminino , Humanos , Malaui , Escolaridade , Fatores SocioeconômicosRESUMO
The genetic diversity found in natural populations is the result of the evolutionary forces in response to historical and contemporary factors. The environmental characteristics and geological history of Mexico promoted the evolution and diversification of plant species, including wild relatives of crops such as the wild pumpkins (Cucurbita). Wild pumpkin species are found in a variety of habitats, evidencing their capability to adapt to different environments. Despite the potential value of wild Cucurbita as a genetic reservoir for crops, there is a lack of studies on their genetic diversity. Cucurbita radicans is an endangered species threatened by habitat destruction leading to low densities in small and isolated populations. Here, we analyze Genotype by Sequencing genomic data of the wild pumpkin C. radicans to evaluate the influence of factors like isolation, demographic history, and the environment shaping the amount and distribution of its genetic variation. We analyzed 91 individuals from 14 localities along its reported distribution. We obtained 5,107 SNPs and found medium-high levels of genetic diversity and genetic structure distributed in four main geographic areas with different environmental conditions. Moreover, we found signals of demographic growth related to historical climatic shifts. Outlier loci analysis showed significant association with the environment, principally with precipitation variables. Also, the outlier loci displayed differential changes in their frequencies in response to future global climate change scenarios. Using the results of genetic structure, outlier loci and multivariate analyses of the environmental conditions, we propose priority localities for conservation that encompass most of the genetic diversity of C. radicans.
Assuntos
Cucurbita , Espécies em Perigo de Extinção , Variação Genética , Cucurbita/genética , México , Conservação dos Recursos Naturais , Polimorfismo de Nucleotídeo Único , Genoma de Planta , Genótipo , Genômica , Ecossistema , Mudança Climática , Meio AmbienteRESUMO
The accurate perception of groups with outliers can help us identify potential risks. However, it is unclear how outliers affect the perception of group emotion. To address this question, we conducted a study on group emotion perception in the context of facial identity. We presented 74 participants with pictures of crowds, and asked them to evaluate the valence ratios and intensity of the crowd by means of the Emotional Aperture Measure. The results revealed that outlier emotions were often overestimated within crowds. Moreover, we found that the emotional expression of a close friend modulated the perception of outliers. Specifically, when a close friend expressed the group emotion, participants overestimated the outlier less than when a close friend expressed the outlier emotion. These results suggest that people can detect outliers within groups, and that their perception of group emotion is influenced by close friends. Thus, we provide evidence that facial identity affects group emotion perception.