Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 293
Filtrar
1.
Epigenomics ; : 1-14, 2024 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-39093129

RESUMO

DNA methylation (DNAm)-based deconvolution estimates contain relative data, forming a composition, that standard methods (testing directly on cell proportions) are ill-suited to handle. In this study we examined the performance of an alternative method, analysis of compositions of microbiomes (ANCOM), for the analysis of DNAm-based deconvolution estimates. We performed two different simulation studies comparing ANCOM to a standard approach (two sample t-test performed directly on cell proportions) and analyzed a real-world data from the Women's Health Initiative to evaluate the applicability of ANCOM to DNAm-based deconvolution estimates. Our findings indicate that ANCOM can effectively account for the compositional nature of DNAm-based deconvolution estimates. ANCOM adequately controls the false discovery rate while maintaining statistical power comparable to that of standard methods.


DNA methylation (DNAm)-based deconvolution provides highly accurate estimates of the proportion of each cell type in a mixed-cell type biological sample (e.g., whole-blood). These estimates can be used for examining the association between cell type proportions and biological or clinical end points; for example, comparing the estimated neutrophil proportion in whole blood between smokers and non-smokers. Cell proportion data has unique features which present challenges for traditional and widely used statistical methods. In response to this issue, our work presents two simulation studies and a real-world analysis that benchmark the performance of current standard statistical methods against an alternative method called analysis composition of microbes (ANCOM), which was originally developed for the analysis of microbiome data. In our real-world analysis we used DNAm data collected from Women's Health Initiative Long Life Study I and compared the results of each method against a gold-standard that is typically not available for these analyses. In each of our simulation studies, ANCOM was able to detect true differences in cell proportions between the groups being compared but had a much lower rate of false discovery compared with the standard statistical methods. Our real-world analysis demonstrated similar findings. Overall, our study highlights the potential of ANCOM as a powerful and robust method for analyzing DNAm-derived deconvolution estimates when the interest is comparisons of cell type proportions and biological or clinical end points. ANCOM's ability to minimize false discovery while maintaining robust statistical power positions it as a valuable addition to the epigenomic analysis toolkit.

2.
BMC Public Health ; 24(1): 1768, 2024 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-38961409

RESUMO

BACKGROUND: As components of a 24-hour day, sedentary behavior (SB), physical activity (PA), and sleep are all independently linked to cardiovascular health (CVH). However, insufficient understanding of components' mutual exclusion limits the exploration of the associations between all movement behaviors and health outcomes. The aim of this study was to employ compositional data analysis (CoDA) approach to investigate the associations between 24-hour movement behaviors and overall CVH. METHODS: Data from 581 participants, including 230 women, were collected from the 2005-2006 wave of the US National Health and Nutrition Examination Survey (NHANES). This dataset included information on the duration of SB and PA, derived from ActiGraph accelerometers, as well as self-reported sleep duration. The assessment of CVH was conducted in accordance with the criteria outlined in Life's Simple 7, encompassing the evaluation of both health behaviors and health factors. Compositional linear regression was utilized to examine the cross-sectional associations of 24-hour movement behaviors and each component with CVH score. Furthermore, the study predicted the potential differences in CVH score that would occur by reallocating 10 to 60 min among different movement behaviors. RESULTS: A significant association was observed between 24-hour movement behaviors and overall CVH (p < 0.001) after adjusting for potential confounders. Substituting moderate-to-vigorous physical activity (MVPA) for other components was strongly associated with favorable differences in CVH score (p < 0.05), whether in one-for-one reallocations or one-for-remaining reallocations. Allocating time away from MVPA consistently resulted in larger negative differences in CVH score (p < 0.05). For instance, replacing 10 min of light physical activity (LPA) with MVPA was related to an increase of 0.21 in CVH score (95% confidence interval (95% CI) 0.11 to 0.31). Conversely, when the same duration of MVPA was replaced with LPA, CVH score decreased by 0.67 (95% CI -0.99 to -0.35). No such significance was discovered for all duration reallocations involving only LPA, SB, and sleep (p > 0.05). CONCLUSIONS: MVPA seems to be as a pivotal determinant for enhancing CVH among general adult population, relative to other movement behaviors. Consequently, optimization of MVPA duration is an essential element in promoting overall health and well-being.


Assuntos
Doenças Cardiovasculares , Exercício Físico , Comportamento Sedentário , Humanos , Feminino , Masculino , Pessoa de Meia-Idade , Adulto , Doenças Cardiovasculares/prevenção & controle , Estudos Transversais , Exercício Físico/fisiologia , Inquéritos Nutricionais , Fatores de Tempo , Sono/fisiologia , Estados Unidos , Idoso , Comportamentos Relacionados com a Saúde
3.
Annu Rev Stat Appl ; 11(1): 483-504, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38962089

RESUMO

The microbiome represents a hidden world of tiny organisms populating not only our surroundings but also our own bodies. By enabling comprehensive profiling of these invisible creatures, modern genomic sequencing tools have given us an unprecedented ability to characterize these populations and uncover their outsize impact on our environment and health. Statistical analysis of microbiome data is critical to infer patterns from the observed abundances. The application and development of analytical methods in this area require careful consideration of the unique aspects of microbiome profiles. We begin this review with a brief overview of microbiome data collection and processing and describe the resulting data structure. We then provide an overview of statistical methods for key tasks in microbiome data analysis, including data visualization, comparison of microbial abundance across groups, regression modeling, and network inference. We conclude with a discussion and highlight interesting future directions.

4.
Microorganisms ; 12(7)2024 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-39065253

RESUMO

The relationships among bacterial flora, diseases, and diet have been described by many authors. An operational taxonomic units (OTUs) are the result of clustering the 16S rRNA gene sequences at a certain cutoff value, and they are considered compositional data. As Pearson's correlation coefficient is difficult to interpret, Aitchison's ratio analysis was used to develop a method to handle compositional data. Multivariate analysis was developed because univariate analysis can be subject to large biases. Simulations regarding absolute abundance based on certain assumptions and some analyses, such as nonparametric multidimensional scaling (NMDS), principal component analysis (PCA), and ratio analysis, were conducted in this study. The same content as a 100% stacked bar graph could be expressed in low dimensions using PCA. However, the relative diversity was not reproducible with NMDS. Various assumptions were made regarding absolute abundance based on the relative abundance. However, which assumptions are true could not be determined. In summary, ratio analysis and PCA are useful for analyzing compositional data and the gut microbiota.

5.
Brain Inform ; 11(1): 19, 2024 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-38987395

RESUMO

Bipolar psychometric scales data are widely used in psychologic healthcare. Adequate psychological profiling benefits patients and saves time and costs. Grant funding depends on the quality of psychotherapeutic measures. Bipolar Likert scales yield compositional data because any order of magnitude of agreement towards an item assertion implies a complementary order of magnitude of disagreement. Using an isometric log-ratio (ilr) transformation the bivariate information can be transformed towards the real valued interval scale yielding unbiased statistical results increasing the statistical power of the Pearson correlation significance test if the Central Limit Theorem (CLT) of statistics is satisfied. In practice, however, the applicability of the CLT depends on the number of summands (i.e., the number of items) and the variance of the data generating process (DGP) of the ilr transformed data. Via simulation we provide evidence that the ilr approach also works satisfactory if the CLT is violated. That is, the ilr approach is robust towards extremely large or infinite variances of the underlying DGP increasing the statistical power of the correlation test. The study generalizes former results pointing out the universality and reliability of the ilr approach in psychometric big data analysis affecting psychometric health economics, patient welfare, grant funding, economic decision making and profits.

6.
Front Genet ; 15: 1369628, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38903761

RESUMO

Genotype-to-phenotype mapping is an essential problem in the current genomic era. While qualitative case-control predictions have received significant attention, less emphasis has been placed on predicting quantitative phenotypes. This emerging field holds great promise in revealing intricate connections between microbial communities and host health. However, the presence of heterogeneity in microbiome datasets poses a substantial challenge to the accuracy of predictions and undermines the reproducibility of models. To tackle this challenge, we investigated 22 normalization methods that aimed at removing heterogeneity across multiple datasets, conducted a comprehensive review of them, and evaluated their effectiveness in predicting quantitative phenotypes in three simulation scenarios and 31 real datasets. The results indicate that none of these methods demonstrate significant superiority in predicting quantitative phenotypes or attain a noteworthy reduction in Root Mean Squared Error (RMSE) of the predictions. Given the frequent occurrence of batch effects and the satisfactory performance of batch correction methods in predicting datasets affected by these effects, we strongly recommend utilizing batch correction methods as the initial step in predicting quantitative phenotypes. In summary, the performance of normalization methods in predicting metagenomic data remains a dynamic and ongoing research area. Our study contributes to this field by undertaking a comprehensive evaluation of diverse methods and offering valuable insights into their effectiveness in predicting quantitative phenotypes.

7.
Am J Epidemiol ; 2024 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-38918044

RESUMO

Deterministic variables are variables that are functionally determined by one or more parent variables. They commonly arise when a variable has been functionally created from one or more parent variables, as with derived variables, and in compositional data, where the 'whole' variable is determined from its 'parts'. This article introduces how deterministic variables may be depicted within directed acyclic graphs (DAGs) to help with identifying and interpreting causal effects involving derived variables and/or compositional data. We propose a two-step approach in which all variables are initially considered, and a choice is made whether to focus on the deterministic variable or its determining parents. Depicting deterministic variables within DAGs brings several benefits. It is easier to identify and avoid misinterpreting tautological associations, i.e., self-fulfilling associations between deterministic variables and their parents, or between sibling variables with shared parents. In compositional data, it is easier to understand the consequences of conditioning on the 'whole' variable, and correctly identify total and relative causal effects. For derived variables, it encourages greater consideration of the target estimand and greater scrutiny of the consistency and exchangeability assumptions. DAGs with deterministic variables are a useful aid for planning and interpreting analyses involving derived variables and/or compositional data.

8.
Food Chem ; 456: 139916, 2024 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-38876056

RESUMO

This research examined the triacylglycerol composition of Iberian pig hams from Sevilla province, focusing on the influence of growing area, season, breed, age, montanera duration, and feeding types. Compositional data analysis (CoDA) tools and standard multivariate statistics were employed to analyse the original and CoDa-transformed data. ANOVA (ilr) and ANCOVA (log ratios) revealed significant effects of season, feeding type, and towns on triacylglycerol profiles, while montanera showed limited or no effect. Breeds and age were deemed irrelevant. Various discriminant analysis (DA) methods consistently distinguished samples from the 2004/2005 season and the cebo feeding type but struggled with other distinctions. PLS-R analysis indicated that bellota feeding was associated with triacylglycerols rich in oleic acid, while cebo was predominantly linked to those containing palmitic and stearic acids. The study challenges traditional assumptions about the effects of montanera, breeds, and age on Iberian pig hams and highlights the need for further investigation.


Assuntos
Cruzamento , Estações do Ano , Triglicerídeos , Animais , Triglicerídeos/análise , Triglicerídeos/metabolismo , Suínos/crescimento & desenvolvimento , Ração Animal/análise , Carne/análise , Análise de Dados , Análise Discriminante
9.
Microbiol Spectr ; 12(7): e0410823, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38832899

RESUMO

The rapid spread of antimicrobial resistance (AMR) is a threat to global health, and the nature of co-occurring antimicrobial resistance genes (ARGs) may cause collateral AMR effects once antimicrobial agents are used. Therefore, it is essential to identify which pairs of ARGs co-occur. Given the wealth of next-generation sequencing data available in public repositories, we have investigated the correlation between ARG abundances in a collection of 214,095 metagenomic data sets. Using more than 6.76∙108 read fragments aligned to acquired ARGs to infer pairwise correlation coefficients, we found that more ARGs correlated with each other in human and animal sampling origins than in soil and water environments. Furthermore, we argued that the correlations could serve as risk profiles of resistance co-occurring to critically important antimicrobials (CIAs). Using these profiles, we found evidence of several ARGs conferring resistance for CIAs being co-abundant, such as tetracycline ARGs correlating with most other forms of resistance. In conclusion, this study highlights the important ARG players indirectly involved in shaping the resistomes of various environments that can serve as monitoring targets in AMR surveillance programs. IMPORTANCE: Understanding the collateral effects happening in a resistome can reveal previously unknown links between antimicrobial resistance genes (ARGs). Through the analysis of pairwise ARG abundances in 214K metagenomic samples, we observed that the co-abundance is highly dependent on the environmental context and argue that these correlations can be used to show the risk of co-selection occurring in different settings.


Assuntos
Antibacterianos , Bactérias , Farmacorresistência Bacteriana , Metagenômica , Humanos , Antibacterianos/farmacologia , Bactérias/genética , Bactérias/efeitos dos fármacos , Bactérias/classificação , Farmacorresistência Bacteriana/genética , Animais , Genes Bacterianos/genética , Microbiologia do Solo , Sequenciamento de Nucleotídeos em Larga Escala , Metagenoma/genética
10.
BMC Health Serv Res ; 24(1): 565, 2024 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-38724977

RESUMO

BACKGROUND: Prolonged standing at work may contribute to increased risk of musculoskeletal pain in home care workers. Patients' activities of daily living (ADL) score may be a proxy for home care workers' standing time at work. The objective of the present study was to investigate the association between patients' ADL self-care score, and workers standing time. METHODS: This cross-sectional study measured time spent standing, sitting and in physical activity for seven days using thigh-worn accelerometers, among 14 home care workers. Patients' ADL self-care scores are routinely adjusted by home care nurses, and time intervals of home care visits are stored in home care services electronic patient journal. We collected ADL self-care scores and start and end time points of visits, and categorized ADL self-care scores as low (ADL ≤ 2.0), medium (ADL > 2.0 to 3.0) or high (ADL > 3.0). Physical behavior data were transformed to isometric log-ratios and a mixed-effect model was used to investigate differences in physical behavior between the three ADL self-care score categories. RESULTS: We analyzed 931 patient visits and found that high ADL self-care scores were associated with longer standing times relative to sitting and physical activity, compared to low ADL score (0.457, p = 0.001). However, no significant differences in time spent standing were found between high and medium ADL patient visits (0.259, p = 0.260), nor medium and low (0.204, p = 0.288). High ADL score patients made up 33.4% of the total care time, despite only making up 7.8% of the number of patients. CONCLUSION: Our findings suggest that caring for patients with high ADL self-care score requires workers to stand for longer durations and that this group of patients constitute a significant proportion of home care workers' total work time. The findings of this study can inform interventions to improve musculoskeletal health among home care workers by appropriate planning of patient visits.


Assuntos
Atividades Cotidianas , Serviços de Assistência Domiciliar , Visitadores Domiciliares , Autocuidado , Humanos , Estudos Transversais , Masculino , Feminino , Noruega , Pessoa de Meia-Idade , Visitadores Domiciliares/estatística & dados numéricos , Adulto , Posição Ortostática , Acelerometria , Dor Musculoesquelética/terapia
11.
Sci Rep ; 14(1): 12196, 2024 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-38806627

RESUMO

This study introduces a novel groundwater pollution index (GPI) formulated through compositional data analysis (CoDa) and robust principal component analysis (RPCA) to enhance groundwater quality assessment. Using groundwater quality monitoring data from sites impacted by the 2010-2011 foot-and-mouth disease outbreak in South Korea, CoDa uncovers critical hydrochemical differences between leachate-influenced and background groundwater. The GPI was developed by selecting key subcompositional parts (NH4+-N, Cl-, and NO3--N) using RPCA, performing the isometric log-ratio (ILR) transformation, and normalizing the results to environmental standards, thereby providing a more precise and accurate assessment of pollution. Validated against government criteria, the GPI has shown its potential as an alternative assessment tool, with its reliability confirmed by receiver operating characteristic curve analysis. This study highlights the essential role of CoDa, especially the ILR -transformation, in overcoming the limitations of traditional statistical methods that often neglect the relative nature of hydrochemical data. Our results emphasize the utility of the GPI in significantly advancing groundwater quality monitoring and management by addressing a methodological gap in the quantitative assessment of groundwater pollution.

12.
Environ Geochem Health ; 46(6): 192, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38696062

RESUMO

Urban areas are characterized by a constant anthropogenic input, which is manifested in the chemical composition of the surface layer of urban soil. The consequence is the formation of intense anomalies of chemical elements, including lead (Pb), that are atypical for this landscape. Therefore, this study aims to explore the compositional-geochemical characteristics of soil Pb anomalies in the urban areas of Yerevan, Gyumri, and Vanadzor, and to identify the geochemical associations of Pb that emerge under prevalent anthropogenic influences in these urban areas. The results obtained through the combined use of compositional data analysis and geospatial mapping showed that the investigated Pb anomalies in different cities form source-specific geochemical associations influenced by historical and ongoing activities, as well as the natural geochemical behavior of chemical elements occurring in these areas. Specifically, in Yerevan, Pb was closely linked with Cu and Zn, forming a group of persistent anthropogenic tracers of urban areas. In contrast, in Gyumri and Vanadzor, Pb was linked with Ca, suggesting that over decades, complexation of Pb by Ca carbonates occurred. These patterns of compositional-geochemical characteristics of Pb anomalies are directly linked to the socio-economic development of cities and the various emission sources present in their environments during different periods. The human health risk assessment showed that children are under the Pb-induced non-carcinogenic risk by a certainty of 63.59% in Yerevan and 50% both in Gyumri and Vanadzor.


Assuntos
Cidades , Chumbo , Poluentes do Solo , Chumbo/análise , Poluentes do Solo/análise , Humanos , Medição de Risco , Monitoramento Ambiental/métodos , Solo/química , Exposição Ambiental , Criança , Ucrânia
13.
Stat Methods Med Res ; 33(6): 1043-1054, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38654396

RESUMO

Ordinal response is commonly found in medicine, biology, and other fields. In many situations, the predictors for this ordinal response are compositional, which means that the sum of predictors for each sample is fixed. Examples of compositional data include the relative abundance of species in microbiome data and the relative frequency of nutrition concentrations. Moreover, the predictors that are strongly correlated tend to have similar influence on the response outcome. Conventional cumulative logistic regression models for ordinal responses ignore the fixed-sum constraint on predictors and their associated interrelationships, and thus are not appropriate for analyzing compositional predictors.To solve this problem, we proposed Bayesian Compositional Models for Ordinal Response to analyze the relationship between compositional data and an ordinal response with a structured regularized horseshoe prior for the compositional coefficients and a soft sum-to-zero restriction on coefficients through the prior distribution. The method was implemented with R package rstan using efficient Hamiltonian Monte Carlo algorithm. We performed simulations to compare the proposed approach and existing methods for ordinal responses. Results revealed that our proposed method outperformed the existing methods in terms of parameter estimation and prediction. We also applied the proposed method to a microbiome study HMP2Data, to find microorganisms linked to ordinal inflammatory bowel disease levels. To make this work reproducible, the code and data used in this paper are available at https://github.com/Li-Zhang28/BCO.


Assuntos
Algoritmos , Teorema de Bayes , Microbiota , Modelos Estatísticos , Método de Monte Carlo , Humanos , Doenças Inflamatórias Intestinais , Simulação por Computador , Modelos Logísticos
14.
Sci Total Environ ; 933: 172398, 2024 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-38677437

RESUMO

Soil contamination in outdoor shooting ranges (OSRs) is a major threat for human health, particularly when, after the end of activities, the land is used for recreational areas or agricultural production. The status of land degradation of an OSR in southern Italy was assessed using a multisensor approach. It was based on: i) proximal sensors, including electromagnetic induction (EMI) for measuring soil electrical conductivity (ECa) and magnetic susceptibility (MSa), γ-ray spectrometry for K, eU and eTh analyses and ultrasonic penetrometry detecting cone index (CI) data representative of soil's strength, ii) field surveys on soil thickness (ST), and iii) laboratory analyses of potentially-toxic-elements (PTEs) by portable X-ray fluorescence spectrometry and polycyclic aromatic hydrocarbons (PAHs) by gas-chromatography. Spatial variability of measurements was modelled and mapped using geostatistical methods. The most densely measured covariate (i.e., the ECa of the topsoil) was used within kriging with external drift to improve the PTEs predictions. The PTEs maps were complemented by maps of spatial uncertainty. A robust multivariate principal component analysis (rPCA) was applied to proximal sensor and laboratory data and allowed to identify associations of PAHs, lead, CI with the topsoil ECa along the first component (PC1), highlighting the correlation between land anthropogenic effects and EMI measures; while the association between the ST (estimating the depth of underground travertine hard-layers) and the bottom soil ECa and MSa along the second component (PC2) evidenced the influence of soil stratigraphy on the EMI measures. This study demonstrates that the simultaneous use of different proximal sensors associated with laboratory analysis can allow to assess and model the spatial variability of the land degradation status of an OSR, including soil compaction, organic and inorganic contamination. The correlation between EMI data with the PTEs content highlights the potential of this technique in the field of soil contamination.

15.
Ann Work Expo Health ; 68(5): 522-534, 2024 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-38603465

RESUMO

OBJECTIVES: This study aimed to explore the association between arm elevation and neck/shoulder pain, and trunk forwarding bending and low back pain among home care workers. METHODS: Home care workers (N = 116) from 11 home care units in Trondheim, Norway, filled in pain assessment and working hours questionnaire, and wore 3 accelerometers for up to 7 consecutive days. Work time was partitioned into upright awkward posture, nonawkward posture, and nonupright time, i.e. sitting. Within a compositional approach framework, posture time compositions were expressed in terms of log-ratio coordinates for statistical analysis and modeling. Poisson generalized linear mixed models were used to analyze the relationship between arm elevation in upright postures and neck/shoulder pain, and between trunk forward bending in upright postures and low back pain, respectively. Isotemporal substitution analysis was used to investigate the association of pain assessment with the reallocation of time spent in the different postures. RESULTS: Time spent in awkward postures was modest, especially for the more extreme angles (60° and 90°). Adjusting for age, gender, and body mass index, our study suggested that the compositions of time spent by home care workers in awkward postures were significantly associated with pain assessment (P < 0.01). Isotemporal substitution analysis showed that reallocating 5 min from upright posture with arms elevated below to above 60° and 90° was associated with a 6.8% and 19.9% increase in the neck/shoulder pain score, respectively. Reallocating 5 min from a forward bending posture while upright below to above 30°, 60°, and 90° was associated with 1.8%, 3.5%, and 4.0% increase in low back pain, respectively. CONCLUSIONS: Although the exposure to awkward postures was modest, our results showed an association between increased time spent in awkward postures and an increase in neck/shoulder pain and low back pain in home care workers. As musculoskeletal pain is the leading cause of sickness absence, these findings suggest that home care units could benefit from re-organizing work to avoid excessive arm elevation and trunk forward bending in workers.


Assuntos
Dor Musculoesquelética , Doenças Profissionais , Postura , Dor de Ombro , Humanos , Postura/fisiologia , Masculino , Feminino , Adulto , Dor Musculoesquelética/etiologia , Pessoa de Meia-Idade , Doenças Profissionais/etiologia , Dor de Ombro/etiologia , Noruega , Dor Lombar/etiologia , Inquéritos e Questionários , Cervicalgia/etiologia , Serviços de Assistência Domiciliar , Acelerometria , Exposição Ocupacional/análise , Exposição Ocupacional/efeitos adversos , Medição da Dor/métodos
16.
Sci Rep ; 14(1): 8494, 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38605041

RESUMO

Effective forecasting of energy consumption structure is vital for China to reach its "dual carbon" objective. However, little attention has been paid to existing studies on the holistic nature and internal properties of energy consumption structure. Therefore, this paper incorporates the theory of compositional data into the study of energy consumption structure, which not only takes into account the specificity of the internal features of the structure, but also digs deeper into the relative information. Meanwhile, based on the minimization theory of squares of the Aitchison distance in the compositional data, a combined model based on the three single models, namely the metabolism grey model (MGM), back-propagation neural network (BPNN) model, and autoregressive integrated moving average (ARIMA) model, is structured in this paper. The forecast results of the energy consumption structure in 2023-2040 indicate that the future energy consumption structure of China will evolve towards a more diversified pattern, but the proportion of natural gas and non-fossil energy has yet to meet the policy goals set by the government. This paper not only suggests that compositional data from joint prediction models have a high applicability value in the energy sector, but also has some theoretical significance for adapting and improving the energy consumption structure in China.

17.
Talanta ; 274: 125954, 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38599113

RESUMO

Complex matrices such as soil have a range of measurable characteristics, and thus data to describe them can be considered multidimensional. These characteristics can be strongly influenced by factors that introduce confounding effects that hinder analyses. Traditional statistical approaches lack the flexibility and granularity required to adequately evaluate such matrices, particularly those with large dataset of varying data types (i.e. quantitative non-compositional, quantitative compositional). We present a statistical workflow designed to effectively analyse complex, multidimensional systems, even in the presence of confounding variables. The developed methodology involves exploratory analysis to identify the presence of confounding variables, followed by data decomposition (including strategies for both compositional and non-compositional quantitative data) to minimise the influence of these confounding factors such as sampling site/location. These data processing methods then allow for common patterns to be highlighted in the data, including the identification of biomarkers and determination of non-trivial associations between variables. We demonstrate the utility of this statistical workflow by jointly analysing the chemical composition and fungal biodiversity of New Zealand vineyard soils that have been managed with either organic low-input or conventional input approaches. By applying this pipeline, we were able to identify biomarkers that distinguish viticultural soil from both approaches and also unearth links and associations between the chemical and metagenomic profiles. While soil is an example of a system that can require this type of statistical methodology, there are a range of biological and ecological systems that are challenging to analyse due to the complex interplay of global and local effects. Utilising our developed pipeline will greatly enhance the way that these systems can be studied and the quality and impact of insight gained from their analysis.


Assuntos
Solo , Solo/química , Microbiologia do Solo , Fungos , Biodiversidade , Nova Zelândia
18.
Microbiome ; 12(1): 45, 2024 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-38443997

RESUMO

BACKGROUND: Normalization, as a pre-processing step, can significantly affect the resolution of machine learning analysis for microbiome studies. There are countless options for normalization scheme selection. In this study, we examined compositionally aware algorithms including the additive log ratio (alr), the centered log ratio (clr), and a recent evolution of the isometric log ratio (ilr) in the form of balance trees made with the PhILR R package. We also looked at compositionally naïve transformations such as raw counts tables and several transformations that are based on relative abundance, such as proportions, the Hellinger transformation, and a transformation based on the logarithm of proportions (which we call "lognorm"). RESULTS: In our evaluation, we used 65 metadata variables culled from four publicly available datasets at the amplicon sequence variant (ASV) level with a random forest machine learning algorithm. We found that different common pre-processing steps in the creation of the balance trees made very little difference in overall performance. Overall, we found that the compositionally aware data transformations such as alr, clr, and ilr (PhILR) performed generally slightly worse or only as well as compositionally naïve transformations. However, relative abundance-based transformations outperformed most other transformations by a small but reliably statistically significant margin. CONCLUSIONS: Our results suggest that minimizing the complexity of transformations while correcting for read depth may be a generally preferable strategy in preparing data for machine learning compared to more sophisticated, but more complex, transformations that attempt to better correct for compositionality. Video Abstract.


Assuntos
Algoritmos , Microbiota , Aprendizado de Máquina , Microbiota/genética
19.
BMC Bioinformatics ; 25(1): 90, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38429687

RESUMO

RNA sequencing of time-course experiments results in three-way count data where the dimensions are the genes, the time points and the biological units. Clustering RNA-seq data allows to extract groups of co-expressed genes over time. After standardisation, the normalised counts of individual genes across time points and biological units have similar properties as compositional data. We propose the following procedure to suitably cluster three-way RNA-seq data: (1) pre-process the RNA-seq data by calculating the normalised expression profiles, (2) transform the data using the additive log ratio transform to map the composition in the D-part Aitchison simplex to a D - 1 -dimensional Euclidean vector, (3) cluster the transformed RNA-seq data using matrix-variate Gaussian mixture models and (4) assess the quality of the overall cluster solution and of individual clusters based on cluster separation in the transformed space using density-based silhouette information and on compactness of the cluster in the original space using cluster maps as a suitable visualisation. The proposed procedure is illustrated on RNA-seq data from fission yeast and results are also compared to an analogous two-way approach after flattening out the biological units.


Assuntos
RNA , RNA/genética , Análise de Sequência de RNA/métodos , RNA-Seq , Sequência de Bases , Análise por Conglomerados
20.
Nutrients ; 16(6)2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38542793

RESUMO

Protein intake reportedly increases the risk of diabetes; however, the results have been inconsistent. Diabetes in adulthood may be attributed to early life dietary amino acid composition. This study aimed to investigate the association between amino acid composition and glycemic biomarkers in adolescents. Dietary intake was assessed using a food frequency questionnaire, and fasting glucose and insulin levels were measured in 1238 eighth graders. The homeostatic model assessment (HOMA) indices (insulin resistance and ß-cell function) were calculated. Anthropometrics were measured and other covariates were obtained from a questionnaire. Amino acid composition was isometric log transformed according to the compositional data analysis, which was used as explanatory variables in multivariate linear regression models for glucose, insulin, and HOMA indices. Only the association between glucose and leucine was significant. In replacement of other amino acids with leucine, an increase of 0.1% of total amino acids correlated with a lower glucose level (-1.02 mg/dL). One-to-one substitution of leucine for isoleucine or methionine decreased glucose (-2.98 and -2.28 mg/dL, respectively). Associations with other biomarkers were not observed. In conclusion, compositional data analysis of amino acids revealed an association only with glucose in adolescents; however, the results of this study should be verified in other populations.


Assuntos
Diabetes Mellitus , Resistência à Insulina , Humanos , Adolescente , Leucina , Japão , Glicemia/metabolismo , Insulina , Resistência à Insulina/fisiologia , Aminoácidos , Glucose , Biomarcadores
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA