Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 397
Filtrar
1.
Mach Learn ; 113(7): 3961-3997, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39221170

RESUMO

There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising datadriven approach for this problem as it learns based on each user's historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an "optimized" intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study.

2.
Stud Hist Philos Sci ; 107: 1-10, 2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39106538

RESUMO

We propose that the epistemic functions of replication in science are best understood by relating them to kinds of experimental error/uncertainty. One kind of replication, which we call "direct replications," principally serves to assess the reliability of an experiment through its precision: the presence and degree of random error/statistical uncertainty. The other kind of replication, which we call "conceptual replications," principally serves to assess the validity of an experiment through its accuracy: the presence and degree of systematic errors/uncertainties. To illustrate the aptness of this general view, we examine the Hubble constant controversy in astronomy, showing how astronomers have responded to the concordances and discordances in their results by carrying out the different kinds of replication that we identify, with the aim of establishing a precise, accurate value for the Hubble constant. We contrast our view with Machery's "re-sampling" account of replication, which maintains that replications only assess reliability.

3.
Methods ; 230: 99-107, 2024 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-39097178

RESUMO

Many studies have demonstrated the importance of accurately identifying miRNA-disease associations (MDAs) for understanding disease mechanisms. However, the number of known MDAs is significantly fewer than the unknown pairs. Here, we propose RSANMDA, a subview attention network for predicting MDAs. We first extract miRNA and disease features from multiple similarity matrices. Next, using resampling techniques, we generate different subviews from known MDAs. Each subview undergoes multi-head graph attention to capture its features, followed by semantic attention to integrate features across subviews. Finally, combining raw and training features, we use a multilayer scoring perceptron for prediction. In the experimental section, we conducted comparative experiments with other advanced models on both HMDD v2.0 and HMDD v3.2 datasets. We also performed a series of ablation studies and parameter tuning exercises. Comprehensive experiments conclusively demonstrate the superiority of our model. Case studies on lung, breast, and esophageal cancers further validate our method's predictive capability for identifying disease-related miRNAs.

4.
Brief Funct Genomics ; 2024 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-39173096

RESUMO

Genome-wide association study (GWAS) is essential for investigating the genetic basis of complex diseases; nevertheless, it usually ignores the interaction of multiple single nucleotide polymorphisms (SNPs). Genome-wide interaction studies provide crucial means for exploring complex genetic interactions that GWAS may miss. Although many interaction methods have been proposed, challenges still persist, including the lack of epistasis models and the inconsistency of benchmark datasets. SNP data simulation is a pivotal intermediary between interaction methods and real applications. Therefore, it is important to obtain epistasis models and benchmark datasets by simulation tools, which is helpful for further improving interaction methods. At present, many simulation tools have been widely employed in the field of population genetics. According to their basic principles, these existing tools can be divided into four categories: coalescent simulation, forward-time simulation, resampling simulation, and other simulation frameworks. In this paper, their basic principles and representative simulation tools are compared and analyzed in detail. Additionally, this paper provides a discussion and summary of the advantages and disadvantages of these frameworks and tools, offering technical insights for the design of new methods, and serving as valuable reference tools for researchers to comprehensively understand GWAS and genome-wide interaction studies.

5.
Front Digit Health ; 6: 1430245, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39131184

RESUMO

There has been growing attention to multi-class classification problems, particularly those challenges of imbalanced class distributions. To address these challenges, various strategies, including data-level re-sampling treatment and ensemble methods, have been introduced to bolster the performance of predictive models and Artificial Intelligence (AI) algorithms in scenarios where excessive level of imbalance is present. While most research and algorithm development have been focused on binary classification problems, in health informatics there is an increased interest in the field to address the problem of multi-class classification in imbalanced datasets. Multi-class imbalance problems bring forth more complex challenges, as a delicate approach is required to generate synthetic data and simultaneously maintain the relationship between the multiple classes. The aim of this review paper is to examine over-sampling methods tailored for medical and other datasets with multi-class imbalance. Out of 2,076 peer-reviewed papers identified through searches, 197 eligible papers were chosen and thoroughly reviewed for inclusion, narrowing to 37 studies being selected for in-depth analysis. These studies are categorised into four categories: metric, adaptive, structure-based, and hybrid approaches. The most significant finding is the emerging trend toward hybrid resampling methods that combine the strengths of various techniques to effectively address the problem of imbalanced data. This paper provides an extensive analysis of each selected study, discusses their findings, and outlines directions for future research.

6.
BMC Med Res Methodol ; 24(1): 189, 2024 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-39210285

RESUMO

BACKGROUND: Accurate prediction of subject recruitment, which is critical to the success of a study, remains an ongoing challenge. Previous prediction models often rely on parametric assumptions which are not always met or may be difficult to implement. We aim to develop a novel method that is less sensitive to model assumptions and relatively easy to implement. METHODS: We create a weighted resampling-based approach to predict enrollment in year two based on recruitment data from year one of the completed GRIPS and PACE clinical trials. Different weight functions accounted for a range of potential enrollment trajectory patterns. Prediction accuracy was measured by Euclidean distance for enrollment sequence in year two, total enrollment over time, and total weeks to enroll a fixed number of subjects, against the actual year two enrollment data. We compare the performance of the proposed method with an existing Bayesian method. RESULTS: Weighted resampling using GRIPS data resulted in closer prediction evidenced by better coverage of observed enrollment with the prediction intervals and smaller Euclidean distance from actual enrollment in year 2, especially when enrollment gaps were filled prior to the weighted resampling. These scenarios also produced more accurate predictions for total enrollment and number of weeks to enroll 50 participants. These same scenarios outperformed an existing Bayesian method for all 3 accuracy measures. In PACE data, using a reduced year 1 enrollment resulted in closer prediction evidenced by better coverage of observed enrollment with the prediction intervals and smaller Euclidean distance from actual enrollment in year 2, with the weighted resampling scenarios better reflecting the seasonal variation seen in year (1) The reduced enrollment scenarios resulted in closer prediction for total enrollment over 6 and 12 months into year (2) These same scenarios also outperformed an existing Bayesian method for relevant accuracy measures. CONCLUSION: The results demonstrate the feasibility and flexibility for a resampling-based, non-parametric approach for prediction of clinical trial recruitment with limited early enrollment data. Application to a wider setting and long-term prediction accuracy require further investigation.


Assuntos
Teorema de Bayes , Seleção de Pacientes , Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Idoso , Pacientes Internados/estatística & dados numéricos , Estatísticas não Paramétricas , Feminino
7.
PeerJ Comput Sci ; 10: e2119, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983189

RESUMO

Background: Missing data are common when analyzing real data. One popular solution is to impute missing data so that one complete dataset can be obtained for subsequent data analysis. In the present study, we focus on missing data imputation using classification and regression trees (CART). Methods: We consider a new perspective on missing data in a CART imputation problem and realize the perspective through some resampling algorithms. Several existing missing data imputation methods using CART are compared through simulation studies, and we aim to investigate the methods with better imputation accuracy under various conditions. Some systematic findings are demonstrated and presented. These imputation methods are further applied to two real datasets: Hepatitis data and Credit approval data for illustration. Results: The method that performs the best strongly depends on the correlation between variables. For imputing missing ordinal categorical variables, the rpart package with surrogate variables is recommended under correlations larger than 0 with missing completely at random (MCAR) and missing at random (MAR) conditions. Under missing not at random (MNAR), chi-squared test methods and the rpart package with surrogate variables are suggested. For imputing missing quantitative variables, the iterative imputation method is most recommended under moderate correlation conditions.

8.
J Clin Epidemiol ; 174: 111485, 2024 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-39069013

RESUMO

BACKGROUND AND OBJECTIVE: The minimum sample size for multistakeholder Delphi surveys remains understudied. Drawing from three large international multistakeholder Delphi surveys, this study aimed to: 1) investigate the effect of increasing sample size on replicability of results; 2) assess whether the level of replicability of results differed with participant characteristics: for example, gender, age, and profession. METHODS: We used data from Delphi surveys to develop guidance for improved reporting of health-care intervention trials: SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) and CONSORT (Consolidated Standards of Reporting Trials) extension for surrogate end points (n = 175, 22 items rated); CONSORT-SPI [CONSORT extension for Social and Psychological Interventions] (n = 333, 77 items rated); and core outcome set for burn care (n = 553, 88 items rated). Resampling with replacement was used to draw random subsamples from the participant data set in each of the three surveys. For each subsample, the median value of all rated survey items was calculated and compared to the medians from the full participant data set. The median number (and interquartile range) of medians replicated was used to calculate the percentage replicability (and variability). High replicability was defined as ≥80% and moderate as 60% and <80% RESULTS: The average median replicability (variability) as a percentage of total number of items rated from the three datasets was 81% (10%) at a sample size of 60. In one of the datasets (CONSORT-SPI), a ≥80% replicability was reached at a sample size of 80. On average, increasing the sample size from 80 to 160 increased the replicability of results by a further 3% and reduced variability by 1%. For subgroup analysis based on participant characteristics (eg, gender, age, professional role), using resampled samples of 20 to 100 showed that a sample size of 20 to 30 resulted to moderate replicability levels of 64% to 77%. CONCLUSION: We found that a minimum sample size of 60-80 participants in multistakeholder Delphi surveys provides a high level of replicability (≥80%) in the results. For Delphi studies limited to individual stakeholder groups (such as researchers, clinicians, patients), a sample size of 20 to 30 per group may be sufficient.

9.
Methods Mol Biol ; 2812: 155-168, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39068361

RESUMO

This chapter shows applying the Asymmetric Within-Sample Transformation to single-cell RNA-Seq data matched with a previous dropout imputation. The asymmetric transformation is a special winsorization that flattens low-expressed intensities and preserves highly expressed gene levels. Before a standard hierarchical clustering algorithm, an intermediate step removes noninformative genes according to a threshold applied to a per-gene entropy estimate. Following the clustering, a time-intensive algorithm is shown to uncover the molecular features associated with each cluster. This step implements a resampling algorithm to generate a random baseline to measure up/downregulated significant genes. To this aim, we adopt a GLM model as implemented in DESeq2 package. We render the results in graphical mode. While the tools are standard heat maps, we introduce some data scaling to clarify the results' reliability.


Assuntos
Algoritmos , Análise de Célula Única , Análise de Célula Única/métodos , Análise por Conglomerados , Humanos , Perfilação da Expressão Gênica/métodos , Software , Biologia Computacional/métodos , RNA-Seq/métodos
10.
Sensors (Basel) ; 24(12)2024 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-38931742

RESUMO

Corn (Zea mays L.) is the most abundant food/feed crop, making accurate yield estimation a critical data point for monitoring global food production. Sensors with varying spatial/spectral configurations have been used to develop corn yield models from intra-field (0.1 m ground sample distance (GSD)) to regional scales (>250 m GSD). Understanding the spatial and spectral dependencies of these models is imperative to result interpretation, scaling, and deploying models. We leveraged high spatial resolution hyperspectral data collected with an unmanned aerial system mounted sensor (272 spectral bands from 0.4-1 µm at 0.063 m GSD) to estimate silage yield. We subjected our imagery to three band selection algorithms to quantitatively assess spectral reflectance features applicability to yield estimation. We then derived 11 spectral configurations, which were spatially resampled to multiple GSDs, and applied to a support vector regression (SVR) yield estimation model. Results indicate that accuracy degrades above 4 m GSD across all configurations, and a seven-band multispectral sensor which samples the red edge and multiple near-infrared bands resulted in higher accuracy in 90% of regression trials. These results bode well for our quest toward a definitive sensor definition for global corn yield modeling, with only temporal dependencies requiring additional investigation.

11.
Environ Sci Pollut Res Int ; 31(29): 42088-42110, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38862797

RESUMO

The temporal aspect of groundwater vulnerability to contaminants such as nitrate is often overlooked, assuming vulnerability has a static nature. This study bridges this gap by employing machine learning with Detecting Breakpoints and Estimating Segments in Trend (DBEST) algorithm to reveal the underlying relationship between nitrate, water table, vegetation cover, and precipitation time series, that are related to agricultural activities and groundwater demand in a semi-arid region. The contamination probability of Lenjanat Plain has been mapped by comparing random forest (RF), support vector machine (SVM), and K-nearest-neighbors (KNN) models, fed with 32 input variables (dem-derived factors, physiography, distance and density maps, time series data). Also, imbalanced learning and feature selection techniques were investigated as supplementary methods, adding up to four scenarios. Results showed that the RF model, integrated with forward sequential feature selection (SFS) and SMOTE-Tomek resampling method, outperformed the other models (F1-score: 0.94, MCC: 0.83). The SFS techniques outperformed other feature selection methods in enhancing the accuracy of the models with the cost of computational expenses, and the cost-sensitive function proved more efficient in tackling imbalanced data issues than the other investigated methods. The DBEST method identified significant breakpoints within each time series dataset, revealing a clear association between agricultural practices along the Zayandehrood River and substantial nitrate contamination within the Lenjanat region. Additionally, the groundwater vulnerability maps created using the candid RF model and an ensemble of the best RF, SVM, and KNN models predicted mid to high levels of vulnerability in the central parts and the downhills in the southwest.


Assuntos
Monitoramento Ambiental , Água Subterrânea , Aprendizado de Máquina , Nitratos , Nitratos/análise , Água Subterrânea/química , Irã (Geográfico) , Monitoramento Ambiental/métodos , Poluentes Químicos da Água/análise , Máquina de Vetores de Suporte
12.
Cortex ; 177: 130-149, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38852224

RESUMO

Although event-related potential (ERP) research on language processing has capitalized on key, theoretically influential components such as the N400 and P600, their measurement properties-especially the variability in their temporal and spatial parameters-have rarely been examined. The current study examined the measurement properties of the N400 and P600 effects elicited by semantic and syntactic anomalies, respectively, during sentence processing. We used a bootstrap resampling procedure to randomly draw many thousands of resamples varying in sample size and stimulus count from a larger sample of 187 participants and 40 stimulus sentences of each type per condition. Our resampling investigation focused on three issues: (a) statistical power; (b) variability in the magnitudes of the effects; and (c) variability in the temporal and spatial profiles of the effects. At the level of grand averages, the N400 and P600 effects were both robust and substantial. However, across resamples, there was a high degree of variability in effect magnitudes, onset times, and scalp distributions, which may be greater than is currently appreciated in the literature, especially for the P600 effects. These results provide a useful basis for designing future studies using these two well-established ERP components. At the same time, the results also highlight challenges that need to be addressed in future research (e.g., how best to analyze the ERP data without engaging in such questionable research practices as p-hacking).


Assuntos
Eletroencefalografia , Potenciais Evocados , Humanos , Potenciais Evocados/fisiologia , Eletroencefalografia/métodos , Masculino , Feminino , Adulto , Adulto Jovem , Idioma , Semântica , Encéfalo/fisiologia , Adolescente , Compreensão/fisiologia , Leitura
13.
Sci Rep ; 14(1): 13097, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38849493

RESUMO

Customer churn remains a critical concern for businesses, highlighting the significance of retaining existing customers over acquiring new ones. Effective prediction of potential churners aids in devising robust retention policies and efficient customer management strategies. This study dives into the realm of machine learning algorithms for predictive analysis in churn prediction, addressing the inherent challenge posed by diverse and imbalanced customer churn data distributions. This paper introduces a novel approach-the Ratio-based data balancing technique, which addresses data skewness as a pre-processing step, ensuring improved accuracy in predictive modelling. This study fills gaps in existing literature by highlighting the effectiveness of ensemble algorithms and the critical role of data balancing techniques in optimizing churn prediction models. While our research contributes a novel approach, there remain avenues for further exploration. This work evaluates several machine learning algorithms-Perceptron, Multi-Layer Perceptron, Naive Bayes, Logistic Regression, K-Nearest Neighbour, Decision Tree, alongside Ensemble techniques such as Gradient Boosting and Extreme Gradient Boosting (XGBoost)-on balanced datasets achieved through our proposed Ratio-based data balancing technique and the commonly used Data Resampling. Results reveal that our proposed Ratio-based data balancing technique notably outperforms traditional Over-Sampling and Under-Sampling methods in churn prediction accuracy. Additionally, using combined algorithms like Gradient Boosting and XGBoost showed better results than using single methods. Our study looked at different aspects like Accuracy, Precision, Recall, and F-Score, finding that these combined methods are better for predicting customer churn. Specifically, when we used a 75:25 ratio with the XGBoost method, we got the most promising results for our analysis which are presented in this work.

14.
Chemosphere ; 363: 142697, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38925515

RESUMO

The identification of arsenic (As)-contaminated areas is an important prerequisite for soil management and reclamation. Although previous studies have attempted to identify soil As contamination via machine learning (ML) methods combined with soil spectroscopy, they have ignored the rarity of As-contaminated soil samples, leading to an imbalanced learning problem. A novel ML framework was thus designed herein to solve the imbalance issue in identifying soil As contamination from soil visible and near-infrared spectra. Spectral preprocessing, imbalanced dataset resampling, and model comparisons were combined in the ML framework, and the optimal combination was selected based on the recall. In addition, Bayesian optimization was used to tune the model hyperparameters. The optimized model achieved recall, area under the curve, and balanced accuracy values of 0.83, 0.88, and 0.79, respectively, on the testing set. The recall was further improved to 0.87 with the threshold adjustment, indicating the model's excellent performance and generalization capability in classifying As-contaminated soil samples. The optimal model was applied to a global soil spectral dataset to predict areas at a high risk of soil As contamination on a global scale. The ML framework established in this study represents a milestone in the classification of soil As contamination and can serve as a valuable reference for contamination management in soil science.


Assuntos
Arsênio , Teorema de Bayes , Aprendizado de Máquina , Poluentes do Solo , Solo , Poluentes do Solo/análise , Solo/química , Arsênio/análise , Monitoramento Ambiental/métodos
15.
Stat Med ; 43(14): 2783-2810, 2024 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-38705726

RESUMO

Propensity score matching is commonly used to draw causal inference from observational survival data. However, its asymptotic properties have yet to be established, and variance estimation is still open to debate. We derive the statistical properties of the propensity score matching estimator of the marginal causal hazard ratio based on matching with replacement and a fixed number of matches. We also propose a double-resampling technique for variance estimation that takes into account the uncertainty due to propensity score estimation prior to matching.


Assuntos
Pontuação de Propensão , Modelos de Riscos Proporcionais , Humanos , Análise de Sobrevida , Causalidade , Simulação por Computador , Estudos Observacionais como Assunto/estatística & dados numéricos , Modelos Estatísticos
16.
Diagnostics (Basel) ; 14(10)2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38786282

RESUMO

Breast cancer is the most prevalent type of cancer in women. Risk factor assessment can aid in directing counseling regarding risk reduction and breast cancer surveillance. This research aims to (1) investigate the relationship between various risk factors and breast cancer incidence using the BCSC (Breast Cancer Surveillance Consortium) Risk Factor Dataset and create a prediction model for assessing the risk of developing breast cancer; (2) diagnose breast cancer using the Breast Cancer Wisconsin diagnostic dataset; and (3) analyze breast cancer survivability using the SEER (Surveillance, Epidemiology, and End Results) Breast Cancer Dataset. Applying resampling techniques on the training dataset before using various machine learning techniques can affect the performance of the classifiers. The three breast cancer datasets were examined using a variety of pre-processing approaches and classification models to assess their performance in terms of accuracy, precision, F-1 scores, etc. The PCA (principal component analysis) and resampling strategies produced remarkable results. For the BCSC Dataset, the Random Forest algorithm exhibited the best performance out of the applied classifiers, with an accuracy of 87.53%. Out of the different resampling techniques applied to the training dataset for training the Random Forest classifier, the Tomek Link exhibited the best test accuracy, at 87.47%. We compared all the models used with previously used techniques. After applying the resampling techniques, the accuracy scores of the test data decreased even if the training data accuracy increased. For the Breast Cancer Wisconsin diagnostic dataset, the K-Nearest Neighbor algorithm had the best accuracy with the original dataset test set, at 94.71%, and the PCA dataset test set exhibited 95.29% accuracy for detecting breast cancer. Using the SEER Dataset, this study also explores survival analysis, employing supervised and unsupervised learning approaches to offer insights into the variables affecting breast cancer survivability. This study emphasizes the significance of individualized approaches in the management and treatment of breast cancer by incorporating phenotypic variations and recognizing the heterogeneity of the disease. Through data-driven insights and advanced machine learning, this study contributes significantly to the ongoing efforts in breast cancer research, diagnostics, and personalized medicine.

17.
Appl Radiat Isot ; 210: 111341, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38744039

RESUMO

We developed a novel quadratic resampling method for summing up γ-ray spectra with different calibration parameters. We investigated a long-term environmental background γ-ray spectrum by summing up 114 spectra measured using a 30% HPGe detector between 2017 and 2021. Gain variations in different measurement periods shift γ-ray peak positions by a fractional pulse-height bin size up to around 2 keV. The resampling method was applied to measure low-level background γ-ray peaks in the γ-ray spectrum in a wide energy range from 50 keV to 3 MeV. We additionally document temporal variations in the activities of major γ-ray peaks, such as 40K (1461 keV), 208Tl (2615 keV), and other typical nuclides, along with contributions from cosmic rays. The normal distribution of γ-ray background count rates, as evidenced by quantile-quantile plots, indicates consistent data collection throughout the measurement period. Consequently, we assert that the quadratic resampling method for accumulating γ-ray spectra surpasses the linear method (Bossew, 2005) in various aspects.

18.
Ecology ; 105(5): e4302, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38594213

RESUMO

Identifying the mechanisms underlying the changes in the distribution of species is critical to accurately predict how species have responded and will respond to climate change. Here, we take advantage of a late-1950s study on ant assemblages in a canyon near Boulder, Colorado, USA, to understand how and why species distributions have changed over a 60-year period. Community composition changed over 60 years with increasing compositional similarity among ant assemblages. Community composition differed significantly between the periods, with aspect and tree cover influencing composition. Species that foraged in broader temperature ranges became more widespread over the 60-year period. Our work highlights that shifts in community composition and biotic homogenization can occur even in undisturbed areas without strong habitat degradation. We also show the power of pairing historical and contemporary data and encourage more mechanistic studies to predict species changes under climate change.


Assuntos
Formigas , Ecossistema , Temperatura , Formigas/fisiologia , Animais , Colorado , Mudança Climática , Fatores de Tempo
19.
Entropy (Basel) ; 26(3)2024 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-38539740

RESUMO

The knowledge of the causal mechanisms underlying one single system may not be sufficient to answer certain questions. One can gain additional insights from comparing and contrasting the causal mechanisms underlying multiple systems and uncovering consistent and distinct causal relationships. For example, discovering common molecular mechanisms among different diseases can lead to drug repurposing. The problem of comparing causal mechanisms among multiple systems is non-trivial, since the causal mechanisms are usually unknown and need to be estimated from data. If we estimate the causal mechanisms from data generated from different systems and directly compare them (the naive method), the result can be sub-optimal. This is especially true if the data generated by the different systems differ substantially with respect to their sample sizes. In this case, the quality of the estimated causal mechanisms for the different systems will differ, which can in turn affect the accuracy of the estimated similarities and differences among the systems via the naive method. To mitigate this problem, we introduced the bootstrap estimation and the equal sample size resampling estimation method for estimating the difference between causal networks. Both of these methods use resampling to assess the confidence of the estimation. We compared these methods with the naive method in a set of systematically simulated experimental conditions with a variety of network structures and sample sizes, and using different performance metrics. We also evaluated these methods on various real-world biomedical datasets covering a wide range of data designs.

20.
Proc Biol Sci ; 291(2018): 20240079, 2024 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-38471547

RESUMO

The fast rate of replacement of natural areas by expanding cities is a key threat to wildlife worldwide. Many wild species occur in cities, yet little is known on the dynamics of urban wildlife assemblages due to species' extinction and colonization that may occur in response to the rapidly evolving conditions within urban areas. Namely, species' ability to spread within urban areas, besides habitat preferences, is likely to shape the fate of species once they occur in a city. Here we use a long-term dataset on mammals occurring in one of the largest and most ancient cities in Europe to assess whether and how spatial spread and association with specific habitats drive the probability of local extinction within cities. Our analysis included mammalian records dating between years 1832 and 2023, and revealed that local extinctions in urban areas are biased towards species associated with wetlands and that were naturally rare within the city. Besides highlighting the role of wetlands within urban areas for conserving wildlife, our work also highlights the importance of long-term biodiversity monitoring in highly dynamic habitats such as cities, as a key asset to better understand wildlife trends and thus foster more sustainable and biodiversity-friendly cities.


Assuntos
Ecossistema , Áreas Alagadas , Animais , Cidades , Mamíferos , Biodiversidade , Animais Selvagens
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA