Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 2.042
Filtrer
1.
Pharm Stat ; 2024 Aug 04.
Article de Anglais | MEDLINE | ID: mdl-39099192

RÉSUMÉ

The estimands framework outlined in ICH E9 (R1) describes the components needed to precisely define the effects to be estimated in clinical trials, which includes how post-baseline 'intercurrent' events (IEs) are to be handled. In late-stage clinical trials, it is common to handle IEs like 'treatment discontinuation' using the treatment policy strategy and target the treatment effect on outcomes regardless of treatment discontinuation. For continuous repeated measures, this type of effect is often estimated using all observed data before and after discontinuation using either a mixed model for repeated measures (MMRM) or multiple imputation (MI) to handle any missing data. In basic form, both these estimation methods ignore treatment discontinuation in the analysis and therefore may be biased if there are differences in patient outcomes after treatment discontinuation compared with patients still assigned to treatment, and missing data being more common for patients who have discontinued treatment. We therefore propose and evaluate a set of MI models that can accommodate differences between outcomes before and after treatment discontinuation. The models are evaluated in the context of planning a Phase 3 trial for a respiratory disease. We show that analyses ignoring treatment discontinuation can introduce substantial bias and can sometimes underestimate variability. We also show that some of the MI models proposed can successfully correct the bias, but inevitably lead to increases in variance. We conclude that some of the proposed MI models are preferable to the traditional analysis ignoring treatment discontinuation, but the precise choice of MI model will likely depend on the trial design, disease of interest and amount of observed and missing data following treatment discontinuation.

2.
Sci Rep ; 14(1): 18027, 2024 Aug 04.
Article de Anglais | MEDLINE | ID: mdl-39098844

RÉSUMÉ

Ranked set sampling (RSS) is known to increase the efficiency of the estimators while comparing it with simple random sampling. The problem of missingness creates a gap in the information that needs to be addressed before proceeding for estimation. Negligible amount of work has been carried out to deal with missingness utilizing RSS. This paper proposes some logarithmic type methods of imputation for the estimation of population mean under RSS using auxiliary information. The properties of the suggested imputation procedures are examined. A simulation study is accomplished to show that the proposed imputation procedures exhibit better results in comparison to some of the existing imputation procedures. Few real applications of the proposed imputation procedures is also provided to generalize the simulation study.

3.
Mol Ecol Resour ; : e13992, 2024 Jul 06.
Article de Anglais | MEDLINE | ID: mdl-38970328

RÉSUMÉ

Current methodologies of genome-wide single-nucleotide polymorphism (SNP) genotyping produce large amounts of missing data that may affect statistical inference and bias the outcome of experiments. Genotype imputation is routinely used in well-studied species to buffer the impact in downstream analysis, and several algorithms are available to fill in missing genotypes. The lack of reference haplotype panels precludes the use of these methods in genomic studies on non-model organisms. As an alternative, machine learning algorithms are employed to explore the genotype data and to estimate the missing genotypes. Here, we propose an imputation method based on self-organizing maps (SOM), a widely used neural networks formed by spatially distributed neurons that cluster similar inputs into close neurons. The method explores genotype datasets to select SNP loci to build binary vectors from the genotypes, and initializes and trains neural networks for each query missing SNP genotype. The SOM-derived clustering is then used to impute the best genotype. To automate the imputation process, we have implemented gtImputation, an open-source application programmed in Python3 and with a user-friendly GUI to facilitate the whole process. The method performance was validated by comparing its accuracy, precision and sensitivity on several benchmark genotype datasets with other available imputation algorithms. Our approach produced highly accurate and precise genotype imputations even for SNPs with alleles at low frequency and outperformed other algorithms, especially for datasets from mixed populations with unrelated individuals.

4.
Health Inf Sci Syst ; 12(1): 37, 2024 Dec.
Article de Anglais | MEDLINE | ID: mdl-38974364

RÉSUMÉ

Obtaining high-quality data sets from raw data is a key step before data exploration and analysis. Nowadays, in the medical domain, a large amount of data is in need of quality improvement before being used to analyze the health condition of patients. There have been many researches in data extraction, data cleaning and data imputation, respectively. However, there are seldom frameworks integrating with these three techniques, making the dataset suffer in accuracy, consistency and integrity. In this paper, a multi-source heterogeneous data enhancement framework based on a lakehouse MHDP is proposed, which includes three steps of data extraction, data cleaning and data imputation. In the data extraction step, a data fusion technique is offered to handle multi-modal and multi-source heterogeneous data. In the data cleaning step, we propose HoloCleanX, which provides a convenient interactive procedure. In the data imputation step, multiple imputation (MI) and the SOTA algorithm SAITS, are applied for different situations. We evaluate our framework via three tasks: clustering, classification and strategy prediction. The experimental results prove the effectiveness of our data enhancement framework.

5.
Annu Rev Stat Appl ; 11: 255-277, 2024 Apr.
Article de Anglais | MEDLINE | ID: mdl-38962579

RÉSUMÉ

The landscape of survival analysis is constantly being revolutionized to answer biomedical challenges, most recently the statistical challenge of censored covariates rather than outcomes. There are many promising strategies to tackle censored covariates, including weighting, imputation, maximum likelihood, and Bayesian methods. Still, this is a relatively fresh area of research, different from the areas of censored outcomes (i.e., survival analysis) or missing covariates. In this review, we discuss the unique statistical challenges encountered when handling censored covariates and provide an in-depth review of existing methods designed to address those challenges. We emphasize each method's relative strengths and weaknesses, providing recommendations to help investigators pinpoint the best approach to handling censored covariates in their data.

6.
Am J Epidemiol ; 2024 Jul 03.
Article de Anglais | MEDLINE | ID: mdl-38960664

RÉSUMÉ

It is unclear how the risk of post-covid symptoms evolved during the pandemic, especially before the spread of Severe Acute Respiratory Syndrome Coronavirus 2 variants and the availability of vaccines. We used modified Poisson regressions to compare the risk of six-month post-covid symptoms and their associated risk factors according to the period of first acute covid: during the French first (March-May 2020) or second (September-November 2020) wave. Non-response weights and multiple imputation were used to handle missing data. Among participants aged 15 or more in a national population-based cohort, the risk of post-covid symptoms was 14.6% (95% CI: 13.9%, 15.3%) in March-May 2020, versus 7.0% (95% CI: 6.3%, 7.7%) in September-November 2020 (adjusted RR: 1.36, 95% CI: 1.20, 1.55). For both periods, the risk was higher in the presence of baseline physical condition(s), and it increased with the number of acute symptoms. During the first wave, the risk was also higher for women, in the presence of baseline mental condition(s), and it varied with educational level. In France in 2020, the risk of six-month post-covid symptoms was higher during the first than the second wave. This difference was observed before the spread of variants and the availability of vaccines.

7.
Psychometrika ; 2024 Jul 06.
Article de Anglais | MEDLINE | ID: mdl-38971882

RÉSUMÉ

The Ising model has become a popular psychometric model for analyzing item response data. The statistical inference of the Ising model is typically carried out via a pseudo-likelihood, as the standard likelihood approach suffers from a high computational cost when there are many variables (i.e., items). Unfortunately, the presence of missing values can hinder the use of pseudo-likelihood, and a listwise deletion approach for missing data treatment may introduce a substantial bias into the estimation and sometimes yield misleading interpretations. This paper proposes a conditional Bayesian framework for Ising network analysis with missing data, which integrates a pseudo-likelihood approach with iterative data imputation. An asymptotic theory is established for the method. Furthermore, a computationally efficient Pólya-Gamma data augmentation procedure is proposed to streamline the sampling of model parameters. The method's performance is shown through simulations and a real-world application to data on major depressive and generalized anxiety disorders from the National Epidemiological Survey on Alcohol and Related Conditions (NESARC).

8.
Biology (Basel) ; 13(7)2024 Jul 09.
Article de Anglais | MEDLINE | ID: mdl-39056705

RÉSUMÉ

Single-cell transcriptomics (scRNA-seq) is revolutionizing biological research, yet it faces challenges such as inefficient transcript capture and noise. To address these challenges, methods like neighbor averaging or graph diffusion are used. These methods often rely on k-nearest neighbor graphs from low-dimensional manifolds. However, scRNA-seq data suffer from the 'curse of dimensionality', leading to the over-smoothing of data when using imputation methods. To overcome this, sc-PHENIX employs a PCA-UMAP diffusion method, which enhances the preservation of data structures and allows for a refined use of PCA dimensions and diffusion parameters (e.g., k-nearest neighbors, exponentiation of the Markov matrix) to minimize noise introduction. This approach enables a more accurate construction of the exponentiated Markov matrix (cell neighborhood graph), surpassing methods like MAGIC. sc-PHENIX significantly mitigates over-smoothing, as validated through various scRNA-seq datasets, demonstrating improved cell phenotype representation. Applied to a multicellular tumor spheroid dataset, sc-PHENIX identified known extreme phenotype states, showcasing its effectiveness. sc-PHENIX is open-source and available for use and modification.

9.
Open Forum Infect Dis ; 11(7): ofae333, 2024 Jul.
Article de Anglais | MEDLINE | ID: mdl-39015347

RÉSUMÉ

Background: Predicting cause-specific mortality among people with HIV (PWH) could facilitate targeted care to improve survival. We assessed discrimination of the Veterans Aging Cohort Study (VACS) Index 2.0 in predicting cause-specific mortality among PWH on antiretroviral therapy (ART). Methods: Using Antiretroviral Therapy Cohort Collaboration data for PWH who initiated ART between 2000 and 2018, VACS Index 2.0 scores (higher scores indicate worse prognosis) were calculated around a randomly selected visit date at least 1 year after ART initiation. Missingness in VACS Index 2.0 variables was addressed through multiple imputation. Cox models estimated associations between VACS Index 2.0 and causes of death, with discrimination evaluated using Harrell's C-statistic. Absolute mortality risk was modelled using flexible parametric survival models. Results: Of 59 741 PWH (mean age: 43 years; 80% male), the mean VACS Index 2.0 at baseline was 41 (range: 0-129). For 2425 deaths over 168 162 person-years follow-up (median: 2.6 years/person), AIDS (n = 455) and non-AIDS-defining cancers (n = 452) were the most common causes. Predicted 5-year mortality for PWH with a mean VACS Index 2.0 score of 38 at baseline was 1% and approximately doubled for every 10-unit increase. The 5-year all-cause mortality C-statistic was .83. Discrimination with the VACS Index 2.0 was highest for deaths resulting from AIDS (0.91), liver-related (0.91), respiratory-related (0.89), non-AIDS infections (0.87), and non-AIDS-defining cancers (0.83), and lowest for suicides/accidental deaths (0.65). Conclusions: For deaths among PWH, discrimination with the VACS Index 2.0 was highest for deaths with measurable physiological causes and was lowest for suicide/accidental deaths.

10.
Comput Biol Med ; 179: 108813, 2024 Jul 01.
Article de Anglais | MEDLINE | ID: mdl-38955127

RÉSUMÉ

BACKGROUND: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. METHOD: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-scale variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. RESULTS: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved R2-scores > 0.01 for 71.55 % of metabolites. CONCLUSION: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.

11.
Multivariate Behav Res ; : 1-29, 2024 Jul 12.
Article de Anglais | MEDLINE | ID: mdl-38997153

RÉSUMÉ

Missingness in intensive longitudinal data triggered by latent factors constitute one type of nonignorable missingness that can generate simultaneous missingness across multiple items on each measurement occasion. To address this issue, we propose a multiple imputation (MI) strategy called MI-FS, which incorporates factor scores, lag/lead variables, and missing data indicators into the imputation model. In the context of process factor analysis (PFA), we conducted a Monte Carlo simulation study to compare the performance of MI-FS to listwise deletion (LD), MI with manifest variables (MI-MV, which implements MI on both dependent variables and covariates), and partial MI with MVs (PMI-MV, which implements MI on covariates and handles missing dependent variables via full-information maximum likelihood) under different conditions. Across conditions, we found MI-based methods overall outperformed the LD; the MI-FS approach yielded lower root mean square errors (RMSEs) and higher coverage rates for auto-regression (AR) parameters compared to MI-MV; and the PMI-MV and MI-MV approaches yielded higher coverage rates for most parameters except AR parameters compared to MI-FS. These approaches were also compared using an empirical example investigating the relationships between negative affect and perceived stress over time. Recommendations on when and how to incorporate factor scores into MI processes were discussed.

12.
Neural Netw ; 179: 106512, 2024 Jul 11.
Article de Anglais | MEDLINE | ID: mdl-39032394

RÉSUMÉ

Network embedding is a general-purpose machine learning technique that converts network data from non-Euclidean space to Euclidean space, facilitating downstream analyses for the networks. However, existing embedding methods are often optimization-based, with the embedding dimension determined in a heuristic or ad hoc way, which can cause potential bias in downstream statistical inference. Additionally, existing deep embedding methods can suffer from a nonidentifiability issue due to the universal approximation power of deep neural networks. We address these issues within a rigorous statistical framework. We treat the embedding vectors as missing data, reconstruct the network features using a sparse decoder, and simultaneously impute the embedding vectors and train the sparse decoder using an adaptive stochastic gradient Markov chain Monte Carlo (MCMC) algorithm. Under mild conditions, we show that the sparse decoder provides a parsimonious mapping from the embedding space to network features, enabling effective selection of the embedding dimension and overcoming the nonidentifiability issue encountered by existing deep embedding methods. Furthermore, we show that the embedding vectors converge weakly to a desired posterior distribution in the 2-Wasserstein distance, addressing the potential bias issue experienced by existing embedding methods. This work lays down the first theoretical foundation for network embedding within the framework of missing data imputation.

13.
Front Med (Lausanne) ; 11: 1407376, 2024.
Article de Anglais | MEDLINE | ID: mdl-39071085

RÉSUMÉ

Introduction: Global Cardiovascular disease (CVD) is still one of the leading causes of death and requires the enhancement of diagnostic methods for the effective detection of early signs and prediction of the disease outcomes. The current diagnostic tools are cumbersome and imprecise especially with complex diseases, thus emphasizing the incorporation of new machine learning applications in differential diagnosis. Methods: This paper presents a new machine learning approach that uses MICE for mitigating missing data, the IQR for handling outliers and SMOTE to address first imbalance distance. Additionally, to select optimal features, we introduce the Hybrid 2-Tier Grasshopper Optimization with L2 regularization methodology which we call GOL2-2T. One of the promising methods to improve the predictive modelling is an Adaboost decision fusion (ABDF) ensemble learning algorithm with babysitting technique implemented for the hyperparameters tuning. The accuracy, recall, and AUC score will be considered as the measures for assessing the model. Results: On the results, our heart disease prediction model yielded an accuracy of 83.0%, and a balanced F1 score of 84.0%. The integration of SMOTE, IQR outlier detection, MICE, and GOL2-2T feature selection enhances robustness while improving the predictive performance. ABDF removed the impurities in the model and elaborated its effectiveness, which proved to be high on predicting the heart disease. Discussion: These findings demonstrate the effectiveness of additional machine learning methodologies in medical diagnostics, including early recognition improvements and trustworthy tools for clinicians. But yes, the model's use and extent of work depends on the dataset used for it really. Further work is needed to replicate the model across different datasets and samples: as for most models, it will be important to see if the results are generalizable to populations that are not representative of the patient population that was used for the current study.

14.
Eur Geriatr Med ; 2024 Jul 26.
Article de Anglais | MEDLINE | ID: mdl-39060781

RÉSUMÉ

PURPOSE: The purpose of the present study was to comprehensively examine the association between inadequate physical activity (PA), cognitive activity (CA), and social activity (SA) and the development of sarcopenia. METHODS: We conducted a two-wave survey. In the first-wave survey, we asked participants five questions for each of the three categories-PA, CA, and SA. The low-activity group was defined as those who fell into the decline category for one or more of the five questions. In both Wave 1 and Wave 2, we assessed the sarcopenia status of our participants. The revised definition of the European Working Group on Sarcopenia in Older People 2 was used to determine sarcopenia, and the Asian Working Group for Sarcopenia criteria were used for cut-off points for muscle mass, grip strength, and walking speed. RESULTS: In the second wave, we were able to follow 2,530 participants (mean age 75.0 ± 4.7 years, 47.8% men). A multivariable logistic regression showed that low-PA participants face a higher risk of incident sarcopenia, both before and after multiple imputations (odds ratio [OR] 1.62, 95% confidence interval (CI) 1.22-2.15 before imputation; OR 1.62, 95% CI 1.21-2.18 after imputation); the low-SA group also showed a higher risk of incident sarcopenia both before and after multiple imputations (OR 1.31, 95% CI 1.05-1.64 before imputation; OR 1.33, 95% CI 1.07-1.65 after imputation). CONCLUSION: Each low PA and SA independently led to incident sarcopenia late in life. Encouraging not only PA, but also SA, may be effective to prevent sarcopenia among older adults.

15.
bioRxiv ; 2024 Jul 10.
Article de Anglais | MEDLINE | ID: mdl-39026852

RÉSUMÉ

Tensor factorization is a dimensionality reduction method applied to multidimensional arrays. These methods are useful for identifying patterns within a variety of biomedical datasets due to their ability to preserve the organizational structure of experiments and therefore aid in generating meaningful insights. However, missing data in the datasets being analyzed can impose challenges. Tensor factorization can be performed with some level of missing data and reconstruct a complete tensor. However, while tensor methods may impute these missing values, the choice of fitting algorithm may influence the fidelity of these imputations. Previous approaches, based on alternating least squares with prefilled values or direct optimization, suffer from introduced bias or slow computational performance. In this study, we propose that censored least squares can better handle missing values with data structured in tensor form. We ran censored least squares on four different biological datasets and compared its performance against alternating least squares with prefilled values and direct optimization. We used the error of imputation and the ability to infer masked values to benchmark their missing data performance. Censored least squares appeared best suited for the analysis of high-dimensional biological data by accuracy and convergence metrics across several studies.

16.
Sci Rep ; 14(1): 17167, 2024 07 26.
Article de Anglais | MEDLINE | ID: mdl-39060355

RÉSUMÉ

Cephalosporin antibiotics are widely used in clinical settings, but they can cause hypersensitivity reactions, which may be influenced by genetic factors such as the expression of Human leukocyte antigen (HLA) molecules. This study aimed to investigate whether specific HLA alleles were associated with an increased risk of adverse reactions to cephalosporins among individuals in the Taiwanese population. This retrospective case-control study analyzed data from the Taiwan Precision Medicine Initiative (TPMI) on 27,933 individuals who received cephalosporin exposure and had HLA allele genotyping information available. Using logistic regression analyses, we examined the associations between HLA genotypes, comorbidities, allergy risk, and severity. Among the study population, 278 individuals had cephalosporin allergy and 2780 were in the control group. Our results indicated that certain HLA alleles, including HLA-B*55:02 (OR = 1.76, 95% CI 1.18-2.61, p = 0.005), HLA-C*01:02 (OR = 1.36, 95% CI 1.05-1.77, p = 0.018), and HLA-DQB1*06:09 (OR = 2.58, 95% CI 1.62-4.12, p < 0.001), were significantly associated with an increased risk of cephalosporin allergy reactions. Additionally, the HLA-C*01:02 allele genotype was significantly associated with a higher risk of severe allergy (OR = 2.33, 95% CI 1.05-5.15, p = 0.04). This study identified significant associations between HLA alleles and an increased risk of cephalosporin allergy, which can aid in early detection and prediction of adverse drug reactions to cephalosporins. Furthermore, our study highlights the importance of HLA typing in drug safety and expanding our knowledge of drug hypersensitivity syndromes.


Sujet(s)
Allèles , Céphalosporines , Hypersensibilité médicamenteuse , Humains , Céphalosporines/effets indésirables , Taïwan/épidémiologie , Mâle , Femelle , Hypersensibilité médicamenteuse/génétique , Hypersensibilité médicamenteuse/épidémiologie , Adulte d'âge moyen , Études cas-témoins , Études rétrospectives , Antigènes HLA/génétique , Adulte , Sujet âgé , Génotype , Prédisposition génétique à une maladie , Antibactériens/effets indésirables
17.
G3 (Bethesda) ; 2024 Jul 23.
Article de Anglais | MEDLINE | ID: mdl-39041837

RÉSUMÉ

With the rapid and significant cost reduction of next-generation sequencing, low-coverage whole-genome sequencing (lcWGS) followed by genotype imputation is becoming a cost-effective alternative to SNP (single nucleotide polymorphism) array genotyping. The objectives of this study were two-fold: 1) construct a haplotype reference panel for genotype imputation from lcWGS data in rainbow trout (Oncorhynchus mykiss); and 2) evaluate the concordance between imputed genotypes and SNP-array genotypes in two breeding populations. Medium-coverage (12x) whole-genome sequences were obtained from a total of 410 fish representing five breeding populations with various spawning dates. The short-read sequences were mapped to the rainbow trout reference genome, and genetic variants were identified using GATK. After data filtering, 20,434,612 biallelic SNPs were retained. The reference panel was phased with SHAPEIT5, and was used as a reference to impute genotypes from lcWGS data using GLIMPSE2. A total of 90 fish from the Troutlodge November breeding population were sequenced with an average coverage of 1.3x, and these fish were also genotyped with the Axiom 57K rainbow trout SNP array. The concordance between array-based genotypes and imputed genotypes was 99.1%. After downsampling the coverage to 0.5x, 0.2x and 0.1x, the concordance between array-based genotypes and imputed genotypes was 98.7%, 97.8% and 96.7%, respectively. In the USDA odd-year breeding population, the concordance between array-based genotypes and imputed genotypes was 97.8% for 109 fish downsampled to 0.5x coverage. Therefore, the reference haplotype panel reported in this study can be used to accurately impute genotypes from lcWGS data in rainbow trout breeding populations.

18.
Pharm Stat ; 2024 Jul 16.
Article de Anglais | MEDLINE | ID: mdl-39013479

RÉSUMÉ

The ICH E9(R1) Addendum (International Council for Harmonization 2019) suggests treatment-policy as one of several strategies for addressing intercurrent events such as treatment withdrawal when defining an estimand. This strategy requires the monitoring of patients and collection of primary outcome data following termination of randomised treatment. However, when patients withdraw from a study early before completion this creates true missing data complicating the analysis. One possible way forward uses multiple imputation to replace the missing data based on a model for outcome on- and off-treatment prior to study withdrawal, often referred to as retrieved dropout multiple imputation. This article introduces a novel approach to parameterising this imputation model so that those parameters which may be difficult to estimate have mildly informative Bayesian priors applied during the imputation stage. A core reference-based model is combined with a retrieved dropout compliance model, using both on- and off-treatment data, to form an extended model for the purposes of imputation. This alleviates the problem of specifying a complex set of analysis rules to accommodate situations where parameters which influence the estimated value are not estimable, or are poorly estimated leading to unrealistically large standard errors in the resulting analysis. We refer to this new approach as retrieved dropout reference-base centred multiple imputation.

19.
Glob Chang Biol ; 30(7): e17399, 2024 Jul.
Article de Anglais | MEDLINE | ID: mdl-39007251

RÉSUMÉ

The ever-increasing and expanding globalisation of trade and transport underpins the escalating global problem of biological invasions. Developing biosecurity infrastructures is crucial to anticipate and prevent the transport and introduction of invasive alien species. Still, robust and defensible forecasts of potential invaders are rare, especially for species without known invasion history. Here, we aim to support decision-making by developing a quantitative invasion risk assessment tool based on invasion syndromes (i.e., generalising typical attributes of invasive alien species). We implemented a workflow based on 'Multiple Imputation with Chain Equation' to estimate invasion syndromes from imputed datasets of species' life-history and ecological traits and macroecological patterns. Importantly, our models disentangle the factors explaining (i) transport and introduction and (ii) establishment. We showcase our tool by modelling the invasion syndromes of 466 amphibians and reptile species with invasion history. Then, we project these models to amphibians and reptiles worldwide (16,236 species [c.76% global coverage]) to identify species with a risk of being unintentionally transported and introduced, and risk of establishing alien populations. Our invasion syndrome models showed high predictive accuracy with a good balance between specificity and generality. Unintentionally transported and introduced species tend to be common and thrive well in human-disturbed habitats. In contrast, those with established alien populations tend to be large-sized, are habitat generalists, thrive well in human-disturbed habitats, and have large native geographic ranges. We forecast that 160 amphibians and reptiles without known invasion history could be unintentionally transported and introduced in the future. Among them, 57 species have a high risk of establishing alien populations. Our reliable, reproducible, transferable, statistically robust and scientifically defensible quantitative invasion risk assessment tool is a significant new addition to the suite of decision-support tools needed for developing a future-proof preventative biosecurity globally.


Sujet(s)
Amphibiens , Prévision , Espèce introduite , Reptiles , Animaux , Reptiles/physiologie , Amphibiens/physiologie , Appréciation des risques/méthodes , Modèles théoriques , Modèles biologiques
20.
Mamm Genome ; 2024 Jul 19.
Article de Anglais | MEDLINE | ID: mdl-39028337

RÉSUMÉ

Ancient DNA provides a unique frame for directly studying human population genetics in time and space. Still, since most of the ancient genomic data is low coverage, analysis is confronted with a low number of SNPs, genotype uncertainties, and reference-bias. Here, we for the first time benchmark the two distinct versions of Glimpse tools on 120 ancient human genomes from Eurasia including those largely from previously under-evaluated regions and compare the performance of genotype imputation with de facto analysis approaches for low coverage genomic data analysis. We further investigate the impact of two distinct reference panels on imputation accuracy for low coverage genomic data. We compute accuracy statistics and perform PCA and f4-statistics to explore the behaviour of genotype imputation on low coverage data regarding (i)two versions of Glimpse, (ii)two reference panels, (iii)four post-imputation filters and coverages, as well as (iv)data type and geographical origin of the samples on the analyses. Our results reveal that even for 0.1X coverage ancient human genomes, genotype imputation using Glimpse-v2 is suitable. Additionally, using the 1000 Genomes merged with Human Genome Diversity Panel improves the accuracy of imputation for the rare variants with low MAF, which might be important not only for ancient genomics but also for modern human genomic studies based on low coverage data and for haplotype-based analysis. Most importantly, we reveal that genotype imputation of low coverage ancient human genomes reduces the genetic affinity of the samples towards human reference genome. Through solving one of the most challenging biases in data analysis, so-called reference bias, genotype imputation using Glimpse v2 is promising for low coverage ancient human genomic data analysis and for rare-variant-based and haplotype-based analysis.

SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE