Search | VHL Regional Portal

1.

Propensity weighting plus adjustment in proportional hazards model is not doubly robust.

Gabriel, Erin E; Sachs, Michael C; Waernbaum, Ingeborg; Goetghebeur, Els; Blanche, Paul F; Vansteelandt, Stijn; Sjölander, Arvid; Scheike, Thomas.

Biometrics ; 80(3)2024 Jul 01.

Article in English | MEDLINE | ID: mdl-39036984

ABSTRACT

Recently, it has become common for applied works to combine commonly used survival analysis modeling methods, such as the multivariable Cox model and propensity score weighting, with the intention of forming a doubly robust estimator of an exposure effect hazard ratio that is unbiased in large samples when either the Cox model or the propensity score model is correctly specified. This combination does not, in general, produce a doubly robust estimator, even after regression standardization, when there is truly a causal effect. We demonstrate via simulation this lack of double robustness for the semiparametric Cox model, the Weibull proportional hazards model, and a simple proportional hazards flexible parametric model, with both the latter models fit via maximum likelihood. We provide a novel proof that the combination of propensity score weighting and a proportional hazards survival model, fit either via full or partial likelihood, is consistent under the null of no causal effect of the exposure on the outcome under particular censoring mechanisms if either the propensity score or the outcome model is correctly specified and contains all confounders. Given our results suggesting that double robustness only exists under the null, we outline 2 simple alternative estimators that are doubly robust for the survival difference at a given time point (in the above sense), provided the censoring mechanism can be correctly modeled, and one doubly robust method of estimation for the full survival curve. We provide R code to use these estimators for estimation and inference in the supporting information.

Subject(s)

Computer Simulation , Propensity Score , Proportional Hazards Models , Humans , Survival Analysis , Likelihood Functions , Biometry/methods

2.

A robust cis-Mendelian randomization method with application to drug target discovery.

Lin, Zhaotong; Pan, Wei.

Nat Commun ; 15(1): 6072, 2024 Jul 18.

Article in English | MEDLINE | ID: mdl-39025905

ABSTRACT

Mendelian randomization (MR) uses genetic variants as instrumental variables (IVs) to investigate causal relationships between traits. Unlike conventional MR, cis-MR focuses on a single genomic region using only cis-SNPs. For example, using cis-pQTLs for a protein as exposure for a disease opens a cost-effective path for drug target discovery. However, few methods effectively handle pleiotropy and linkage disequilibrium (LD) of cis-SNPs. Here, we propose cisMR-cML, a method based on constrained maximum likelihood, robust to IV assumption violations with strong theoretical support. We further clarify the severe but largely neglected consequences of the current practice of modeling marginal, instead of conditional genetic effects, and only using exposure-associated SNPs in cis-MR analysis. Numerical studies demonstrated our method's superiority over other existing methods. In a drug-target analysis for coronary artery disease (CAD), including a proteome-wide application, we identified three potential drug targets, PCSK9, COLEC11 and FGFR1 for CAD.

Subject(s)

Drug Discovery , Linkage Disequilibrium , Mendelian Randomization Analysis , Polymorphism, Single Nucleotide , Humans , Drug Discovery/methods , Coronary Artery Disease/genetics , Coronary Artery Disease/drug therapy , Proprotein Convertase 9/genetics , Proprotein Convertase 9/metabolism , Genetic Pleiotropy , Genome-Wide Association Study/methods , Quantitative Trait Loci , Likelihood Functions

3.

Maximum Likelihood Estimation for Unrooted 3-Leaf Trees: An Analytic Solution for the CFN Model.

Hill, Max; Roch, Sebastien; Rodriguez, Jose Israel.

Bull Math Biol ; 86(9): 106, 2024 Jul 12.

Article in English | MEDLINE | ID: mdl-38995457

ABSTRACT

Maximum likelihood estimation is among the most widely-used methods for inferring phylogenetic trees from sequence data. This paper solves the problem of computing solutions to the maximum likelihood problem for 3-leaf trees under the 2-state symmetric mutation model (CFN model). Our main result is a closed-form solution to the maximum likelihood problem for unrooted 3-leaf trees, given generic data; this result characterizes all of the ways that a maximum likelihood estimate can fail to exist for generic data and provides theoretical validation for predictions made in Parks and Goldman (Syst Biol 63(5):798-811, 2014). Our proof makes use of both classical tools for studying group-based phylogenetic models such as Hadamard conjugation and reparameterization in terms of Fourier coordinates, as well as more recent results concerning the semi-algebraic constraints of the CFN model. To be able to put these into practice, we also give a complete characterization to test genericity.

Subject(s)

Mathematical Concepts , Models, Genetic , Mutation , Phylogeny , Likelihood Functions , Algorithms

4.

Comparison of WAIC and posterior predictive approaches for N-mixture models.

Gaya, Heather E; Ketz, Alison C.

Sci Rep ; 14(1): 15743, 2024 07 08.

Article in English | MEDLINE | ID: mdl-38977791

ABSTRACT

Hierarchical models are common for ecological analysis, but determining appropriate model selection methods remains an ongoing challenge. To confront this challenge, a suitable method is needed to evaluate and compare available candidate models. We compared performance of conditional WAIC, a joint-likelihood approach to WAIC (WAICj), and posterior-predictive loss for selecting between candidate N-mixture models. We tested these model selection criteria on simulated single-season N-mixture models, simulated multi-season N-mixture models with temporal auto-correlation, and three case studies of single-season N-mixture models based on eBird data. WAICj proved more accurate than the standard conditional formulation or posterior-predictive loss, even when models were temporally correlated, suggesting WAICj is a robust alternative to model selection for N-mixture models.

Subject(s)

Models, Statistical , Likelihood Functions , Computer Simulation , Seasons , Animals

5.

Bayesian Analysis of Multi-Factorial Experimental Designs Using SEM.

Langenberg, Benedikt; Helm, Jonathan L; Mayer, Axel.

Multivariate Behav Res ; 59(4): 716-737, 2024.

Article in English | MEDLINE | ID: mdl-38984637

ABSTRACT

Latent repeated measures ANOVA (L-RM-ANOVA) has recently been proposed as an alternative to traditional repeated measures ANOVA. L-RM-ANOVA builds upon structural equation modeling and enables researchers to investigate interindividual differences in main/interaction effects, examine custom contrasts, incorporate a measurement model, and account for missing data. However, L-RM-ANOVA uses maximum likelihood and thus cannot incorporate prior information and can have poor statistical properties in small samples. We show how L-RM-ANOVA can be used with Bayesian estimation to resolve the aforementioned issues. We demonstrate how to place informative priors on model parameters that constitute main and interaction effects. We further show how to place weakly informative priors on standardized parameters which can be used when no prior information is available. We conclude that Bayesian estimation can lower Type 1 error and bias, and increase power and efficiency when priors are chosen adequately. We demonstrate the approach using a real empirical example and guide the readers through specification of the model. We argue that ANOVA tables and incomplete descriptive statistics are not sufficient information to specify informative priors, and we identify which parameter estimates should be reported in future research; thereby promoting cumulative research.

Subject(s)

Bayes Theorem , Humans , Analysis of Variance , Research Design/statistics & numerical data , Models, Statistical , Data Interpretation, Statistical , Latent Class Analysis , Likelihood Functions

6.

Pairwise kinship inference and pedigree reconstruction using 91 microhaplotypes.

Wei, Yifan; Zhu, Qiang; Wang, Haoyu; Cao, Yueyan; Li, Xi; Zhang, Xiaokang; Wang, Yufang; Zhang, Ji.

Forensic Sci Int Genet ; 72: 103090, 2024 Sep.

Article in English | MEDLINE | ID: mdl-38968912

ABSTRACT

Kinship inference has been a major issue in forensic genetics, and it remains to be solved when there is no prior hypothesis and the relationships between multiple individuals are unknown. In this study, we genotyped 91 microhaplotypes from 46 pedigree samples using massive parallel sequencing and inferred their relatedness by calculating the likelihood ratio (LR). Based on simulated and real data, different treatments were applied in the presence and absence of relatedness assumptions. The pedigree of multiple individuals was reconstructed by calculating pedigree likelihoods based on real pedigree samples. The results showed that the 91 MHs could discriminate pairs of second-degree relatives from unrelated individuals. And more highly polymorphic loci were needed to discriminate the pairs of second-degree or more distant relative from other degrees of relationship, but correct classification could be obtained by expanding the suspected relationship searched to other relationships with lower LR values. Multiple individuals with unknown relationships can be successfully reconstructed if they are closely related. Our study provides a solution for kinship inference when there are no prior assumptions, and explores the possibility of pedigree reconstruction when the relationships of multiple individuals are unknown.

Subject(s)

Haplotypes , Pedigree , Humans , Likelihood Functions , High-Throughput Nucleotide Sequencing , Genotype , DNA Fingerprinting , Sequence Analysis, DNA , Polymorphism, Single Nucleotide , Forensic Genetics/methods , Male

7.

Measles Infection Dose Responses: Insights from Mathematical Modeling.

Anelone, Anet J N; Clapham, Hannah E.

Bull Math Biol ; 86(7): 85, 2024 Jun 09.

Article in English | MEDLINE | ID: mdl-38853189

ABSTRACT

How viral infections develop can change based on the number of viruses initially entering the body. The understanding of the impacts of infection doses remains incomplete, in part due to challenging constraints, and a lack of research. Gaining more insights is crucial regarding the measles virus (MV). The higher the MV infection dose, the earlier the peak of acute viremia, but the magnitude of the peak viremia remains almost constant. Measles is highly contagious, causes immunosuppression such as lymphopenia, and contributes substantially to childhood morbidity and mortality. This work investigated mechanisms underlying the observed wild-type measles infection dose responses in cynomolgus monkeys. We fitted longitudinal data on viremia using maximum likelihood estimation, and used the Akaike Information Criterion (AIC) to evaluate relevant biological hypotheses and their respective model parameterizations. The lowest AIC indicates a linear relationship between the infection dose, the initial viral load, and the initial number of activated MV-specific T cells. Early peak viremia is associated with high initial number of activated MV-specific T cells. Thus, when MV infection dose increases, the initial viremia and associated immune cell stimulation increase, and reduce the time it takes for T cell killing to be sufficient, thereby allowing dose-independent peaks for viremia, MV-specific T cells, and lymphocyte depletion. Together, these results suggest that the development of measles depends on virus-host interactions at the start and the efficiency of viral control by cellular immunity. These relationships are additional motivations for prevention, vaccination, and early treatment for measles.

Subject(s)

Macaca fascicularis , Mathematical Concepts , Measles virus , Measles , Viral Load , Viremia , Measles/immunology , Measles/transmission , Measles/prevention & control , Measles/virology , Measles/epidemiology , Animals , Viremia/immunology , Viremia/virology , Measles virus/immunology , Measles virus/pathogenicity , Measles virus/physiology , Likelihood Functions , Humans , Models, Immunological , Models, Biological , T-Lymphocytes/immunology , Lymphocyte Activation

8.

Extending the discussion on inconsistency in forensic decisions and results.

Buckleton, John; Bright, Jo-Anne; Taylor, Duncan; Curran, James; Kalafut, Tim.

J Forensic Sci ; 69(4): 1125-1137, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38853374

ABSTRACT

The subject of inter- and intra-laboratory inconsistency was recently raised in a commentary by Itiel Dror. We re-visit an inter-laboratory trial, with which some of the authors of this current discussion were associated, to diagnose the causes of any differences in the likelihood ratios (LRs) assigned using probabilistic genotyping software. Some of the variation was due to different decisions that would be made on a case-by-case basis, some due to laboratory policy and would hence differ between laboratories, and the final and smallest part was the run-to-run difference caused by the Monte Carlo aspect of the software used. However, the net variation in LRs was considerable. We believe that most laboratories will self-diagnose the cause of their difference from the majority answer and in some, but not all instances will take corrective action. An inter-laboratory exercise consisting of raw data files for relatively straightforward mixtures, such as two mixtures of three or four persons, would allow laboratories to calibrate their procedures and findings.

Subject(s)

Software , Humans , Likelihood Functions , Monte Carlo Method , DNA Fingerprinting , Genotype , Laboratories/standards , Decision Making , Forensic Genetics/methods

9.

A targeted likelihood estimation comparing cefepime and piperacillin/tazobactam in critically ill patients with community-acquired pneumonia (CAP).

Serrano-Mayorga, Cristian C; Duque, Sara; Ibáñez-Prada, Elsa D; Garcia-Gallo, Esteban; Arrieta, María P Rojas; Bastidas, Alirio; Rodríguez, Alejandro; Martin-Loeches, Ignacio; Reyes, Luis F.

Sci Rep ; 14(1): 13392, 2024 06 11.

Article in English | MEDLINE | ID: mdl-38862579

ABSTRACT

Cefepime and piperacillin/tazobactam are antimicrobials recommended by IDSA/ATS guidelines for the empirical management of patients admitted to the intensive care unit (ICU) with community-acquired pneumonia (CAP). Concerns have been raised about which should be used in clinical practice. This study aims to compare the effect of cefepime and piperacillin/tazobactam in critically ill CAP patients through a targeted maximum likelihood estimation (TMLE). A total of 2026 ICU-admitted patients with CAP were included. Among them, (47%) presented respiratory failure, and (27%) developed septic shock. A total of (68%) received cefepime and (32%) piperacillin/tazobactam-based treatment. After running the TMLE, we found that cefepime and piperacillin/tazobactam-based treatments have comparable 28-day, hospital, and ICU mortality. Additionally, age, PTT, serum potassium and temperature were associated with preferring cefepime over piperacillin/tazobactam (OR 1.14 95% CI [1.01-1.27], p = 0.03), (OR 1.14 95% CI [1.03-1.26], p = 0.009), (OR 1.1 95% CI [1.01-1.22], p = 0.039) and (OR 1.13 95% CI [1.03-1.24], p = 0.014)]. Our study found a similar mortality rate among ICU-admitted CAP patients treated with cefepime and piperacillin/tazobactam. Clinicians may consider factors such as availability and safety profiles when making treatment decisions.

Subject(s)

Anti-Bacterial Agents , Cefepime , Community-Acquired Infections , Critical Illness , Intensive Care Units , Piperacillin, Tazobactam Drug Combination , Humans , Cefepime/therapeutic use , Cefepime/administration & dosage , Community-Acquired Infections/drug therapy , Community-Acquired Infections/mortality , Piperacillin, Tazobactam Drug Combination/therapeutic use , Male , Female , Aged , Middle Aged , Anti-Bacterial Agents/therapeutic use , Likelihood Functions , Pneumonia/drug therapy , Pneumonia/mortality , Piperacillin/therapeutic use

10.

Unraveling Burst Selection Bias in Single-Molecule FRET of Species with Unequal Brightness and Diffusivity.

Gopich, Irina V; Chung, Hoi Sung.

J Phys Chem B ; 128(23): 5576-5589, 2024 Jun 13.

Article in English | MEDLINE | ID: mdl-38833567

ABSTRACT

Single-molecule free diffusion experiments enable accurate quantification of coexisting species or states. However, unequal brightness and diffusivity introduce a burst selection bias and affect the interpretation of experimental results. We address this issue with a photon-by-photon maximum likelihood method, burstML, which explicitly considers burst selection criteria. BurstML accurately estimates parameters, including photon count rates, diffusion times, Förster resonance energy transfer (FRET) efficiencies, and population, even in cases where species are poorly distinguished in FRET efficiency histograms. We develop a quantitative theory that determines the fraction of photon bursts corresponding to each species and thus obtain accurate species populations from the measured burst fractions. In addition, we provide a simple approximate formula for burst fractions and establish the range of parameters where unequal brightness and diffusivity can significantly affect the results obtained by conventional methods. The performance of the burstML method is compared with that of a maximum likelihood method that assumes equal species brightness and diffusivity, as well as standard Gaussian fitting of FRET efficiency histograms, using both simulated and real single-molecule data for cold-shock protein, protein L, and protein G. The burstML method enhances the accuracy of parameter estimation in single-molecule fluorescence studies.

Subject(s)

Fluorescence Resonance Energy Transfer , Diffusion , Photons , Likelihood Functions , Single Molecule Imaging/methods

11.

Effects of different observed datasets on the calibration of crop model parameters with GLUE: A case study using the CROPGRO-Soybean phenological model.

Zhang, Yonghui; Zhang, Yujie; Jiang, Haiyan; Tang, Liang; Liu, Xiaojun; Cao, Weixing; Zhu, Yan.

PLoS One ; 19(6): e0302098, 2024.

Article in English | MEDLINE | ID: mdl-38870135

ABSTRACT

Suitable combinations of observed datasets for estimating crop model parameters can reduce the computational cost while ensuring accuracy. This study aims to explore the quantitative influence of different combinations of the observed phenological stages on estimation of cultivar-specific parameters (CPSs). We used the CROPGRO-Soybean phenological model (CSPM) as a case study in combination with the Generalized Likelihood Uncertainty Estimation (GLUE) method. Different combinations of four observed phenological stages, including initial flowering, initial pod, initial grain, and initial maturity stages for five soybean cultivars from Exp. 1 and Exp. 3 described in Table 2 are respectively used to calibrate the CSPs. The CSPM, driven by the optimized CSPs, is then evaluated against two independent phenological datasets from Exp. 2 and Exp. 4 described in Table 2. Root means square error (RMSE) (mean absolute error (MAE), coefficient of determination (R2), and Nash Sutcliffe model efficiency (NSE)) are 15.50 (14.63, 0.96, 0.42), 4.76 (3.92, 0.97, 0.95), 4.69 (3.72, 0.98, 0.95), 3.91 (3.40, 0.99, 0.96) and 12.54 (11.67, 0.95, 0.60), 5.07 (4.61, 0.98, 0.93), 4.97 (4.28, 0.97, 0.94), 4.58 (4.02, 0.98, 0.95) for using one, two, three, and four observed phenological stages in the CSPs estimation. The evaluation results suggest that RMSE and MAE decrease, and R2 and NSE increase with the increase in the number of observed phenological stages used for parameter calibration. However, there is no significant reduction in the RMSEs (MAEs, NSEs) using two, three, and four observed stages. Relatively reliable optimized CSPs for CSMP are obtained by using at least two observed phenological stages balancing calibration effect and computational cost. These findings provide new insight into parameter estimation of crop models.

Subject(s)

Crops, Agricultural , Glycine max , Glycine max/growth & development , Crops, Agricultural/growth & development , Calibration , Models, Biological , Likelihood Functions , Uncertainty

12.

CMAPLE: Efficient Phylogenetic Inference in the Pandemic Era.

Ly-Trong, Nhan; Bielow, Chris; De Maio, Nicola; Minh, Bui Quang.

Mol Biol Evol ; 41(7)2024 Jul 03.

Article in English | MEDLINE | ID: mdl-38934791

ABSTRACT

We have recently introduced MAPLE (MAximum Parsimonious Likelihood Estimation), a new pandemic-scale phylogenetic inference method exclusively designed for genomic epidemiology. In response to the need for enhancing MAPLE's performance and scalability, here we present two key components: (i) CMAPLE software, a highly optimized C++ reimplementation of MAPLE with many new features and advancements, and (ii) CMAPLE library, a suite of application programming interfaces to facilitate the integration of the CMAPLE algorithm into existing phylogenetic inference packages. Notably, we have successfully integrated CMAPLE into the widely used IQ-TREE 2 software, enabling its rapid adoption in the scientific community. These advancements serve as a vital step toward better preparedness for future pandemics, offering researchers powerful tools for large-scale pathogen genomic analysis.

Subject(s)

Phylogeny , Software , Algorithms , Pandemics , Likelihood Functions , Humans

13.

Robust analysis of stepped wedge trials using composite likelihood models.

Voldal, Emily C; Kenny, Avi; Xia, Fan; Heagerty, Patrick; Hughes, James P.

Stat Med ; 43(17): 3326-3352, 2024 Jul 30.

Article in English | MEDLINE | ID: mdl-38837431

ABSTRACT

Stepped wedge trials (SWTs) are a type of cluster randomized trial that involve repeated measures on clusters and design-induced confounding between time and treatment. Although mixed models are commonly used to analyze SWTs, they are susceptible to misspecification particularly for cluster-longitudinal designs such as SWTs. Mixed model estimation leverages both "horizontal" or within-cluster information and "vertical" or between-cluster information. To use horizontal information in a mixed model, both the mean model and correlation structure must be correctly specified or accounted for, since time is confounded with treatment and measurements are likely correlated within clusters. Alternative non-parametric methods have been proposed that use only vertical information; these are more robust because between-cluster comparisons in a SWT preserve randomization, but these non-parametric methods are not very efficient. We propose a composite likelihood method that focuses on vertical information, but has the flexibility to recover efficiency by using additional horizontal information. We compare the properties and performance of various methods, using simulations based on COVID-19 data and a demonstration of application to the LIRE trial. We found that a vertical composite likelihood model that leverages baseline data is more robust than traditional methods, and more efficient than methods that use only vertical information. We hope that these results demonstrate the potential value of model-based vertical methods for SWTs with a large number of clusters, and that these new tools are useful to researchers who are concerned about misspecification of traditional models.

Subject(s)

Randomized Controlled Trials as Topic , Humans , Likelihood Functions , Randomized Controlled Trials as Topic/methods , Randomized Controlled Trials as Topic/statistics & numerical data , Cluster Analysis , Computer Simulation , Models, Statistical , COVID-19 , Research Design

14.

Bayesian mixture modelling with ranked set samples.

Alvandi, Amirhossein; Omidvar, Sedigheh; Hatefi, Armin; Jafari Jozani, Mohammad; Ozturk, Omer; Nematollahi, Nader.

Stat Med ; 43(19): 3723-3741, 2024 Aug 30.

Article in English | MEDLINE | ID: mdl-38890118

ABSTRACT

We consider the Bayesian estimation of the parameters of a finite mixture model from independent order statistics arising from imperfect ranked set sampling designs. As a cost-effective method, ranked set sampling enables us to incorporate easily attainable characteristics, as ranking information, into data collection and Bayesian estimation. To handle the special structure of the ranked set samples, we develop a Bayesian estimation approach exploiting the Expectation-Maximization (EM) algorithm in estimating the ranking parameters and Metropolis within Gibbs Sampling to estimate the parameters of the underlying mixture model. Our findings show that the proposed RSS-based Bayesian estimation method outperforms the commonly used Bayesian counterpart using simple random sampling. The developed method is finally applied to estimate the bone disorder status of women aged 50 and older.

Subject(s)

Algorithms , Bayes Theorem , Models, Statistical , Humans , Female , Middle Aged , Aged , Computer Simulation , Monte Carlo Method , Likelihood Functions , Markov Chains

15.

The Tree Reconstruction Game: Phylogenetic Reconstruction Using Reinforcement Learning.

Azouri, Dana; Granit, Oz; Alburquerque, Michael; Mansour, Yishay; Pupko, Tal; Mayrose, Itay.

Mol Biol Evol ; 41(6)2024 Jun 01.

Article in English | MEDLINE | ID: mdl-38829798

ABSTRACT

The computational search for the maximum-likelihood phylogenetic tree is an NP-hard problem. As such, current tree search algorithms might result in a tree that is the local optima, not the global one. Here, we introduce a paradigm shift for predicting the maximum-likelihood tree, by approximating long-term gains of likelihood rather than maximizing likelihood gain at each step of the search. Our proposed approach harnesses the power of reinforcement learning to learn an optimal search strategy, aiming at the global optimum of the search space. We show that when analyzing empirical data containing dozens of sequences, the log-likelihood improvement from the starting tree obtained by the reinforcement learning-based agent was 0.969 or higher compared to that achieved by current state-of-the-art techniques. Notably, this performance is attained without the need to perform costly likelihood optimizations apart from the training process, thus potentially allowing for an exponential increase in runtime. We exemplify this for data sets containing 15 sequences of length 18,000 bp and demonstrate that the reinforcement learning-based method is roughly three times faster than the state-of-the-art software. This study illustrates the potential of reinforcement learning in addressing the challenges of phylogenetic tree reconstruction.

Subject(s)

Algorithms , Phylogeny , Likelihood Functions , Models, Genetic , Computational Biology/methods , Software

16.

A machine-learning-based alternative to phylogenetic bootstrap.

Ecker, Noa; Huchon, Dorothée; Mansour, Yishay; Mayrose, Itay; Pupko, Tal.

Bioinformatics ; 40(Supplement_1): i208-i217, 2024 Jun 28.

Article in English | MEDLINE | ID: mdl-38940166

ABSTRACT

MOTIVATION: Currently used methods for estimating branch support in phylogenetic analyses often rely on the classic Felsenstein's bootstrap, parametric tests, or their approximations. As these branch support scores are widely used in phylogenetic analyses, having accurate, fast, and interpretable scores is of high importance. RESULTS: Here, we employed a data-driven approach to estimate branch support values with a probabilistic interpretation. To this end, we simulated thousands of realistic phylogenetic trees and the corresponding multiple sequence alignments. Each of the obtained alignments was used to infer the phylogeny using state-of-the-art phylogenetic inference software, which was then compared to the true tree. Using these extensive data, we trained machine-learning algorithms to estimate branch support values for each bipartition within the maximum-likelihood trees obtained by each software. Our results demonstrate that our model provides fast and more accurate probability-based branch support values than commonly used procedures. We demonstrate the applicability of our approach on empirical datasets. AVAILABILITY AND IMPLEMENTATION: The data supporting this work are available in the Figshare repository at https://doi.org/10.6084/m9.figshare.25050554.v1, and the underlying code is accessible via GitHub at https://github.com/noaeker/bootstrap_repo.

Subject(s)

Algorithms , Machine Learning , Phylogeny , Software , Sequence Alignment/methods , Computational Biology/methods , Likelihood Functions

17.

Maximum likelihood phylogeographic inference of cell motility and cell division from spatial lineage tracing data.

Mai, Uyen; Hu, Gary; Raphael, Benjamin J.

Bioinformatics ; 40(Supplement_1): i228-i236, 2024 Jun 28.

Article in English | MEDLINE | ID: mdl-38940146

ABSTRACT

MOTIVATION: Recently developed spatial lineage tracing technologies induce somatic mutations at specific genomic loci in a population of growing cells and then measure these mutations in the sampled cells along with the physical locations of the cells. These technologies enable high-throughput studies of developmental processes over space and time. However, these applications rely on accurate reconstruction of a spatial cell lineage tree describing both past cell divisions and cell locations. Spatial lineage trees are related to phylogeographic models that have been well-studied in the phylogenetics literature. We demonstrate that standard phylogeographic models based on Brownian motion are inadequate to describe the spatial symmetric displacement (SD) of cells during cell division. RESULTS: We introduce a new model-the SD model for cell motility that includes symmetric displacements of daughter cells from the parental cell followed by independent diffusion of daughter cells. We show that this model more accurately describes the locations of cells in a real spatial lineage tracing of mouse embryonic stem cells. Combining the spatial SD model with an evolutionary model of DNA mutations, we obtain a phylogeographic model for spatial lineage tracing. Using this model, we devise a maximum likelihood framework-MOLLUSC (Maximum Likelihood Estimation Of Lineage and Location Using Single-Cell Spatial Lineage tracing Data)-to co-estimate time-resolved branch lengths, spatial diffusion rate, and mutation rate. On both simulated and real data, we show that MOLLUSC accurately estimates all parameters. In contrast, the Brownian motion model overestimates spatial diffusion rate in all test cases. In addition, the inclusion of spatial information improves accuracy of branch length estimation compared to sequence data alone. On real data, we show that spatial information has more signal than sequence data for branch length estimation, suggesting augmenting lineage tracing technologies with spatial information is useful to overcome the limitations of genome-editing in developmental systems. AVAILABILITY AND IMPLEMENTATION: The python implementation of MOLLUSC is available at https://github.com/raphael-group/MOLLUSC.

Subject(s)

Cell Division , Cell Lineage , Cell Movement , Animals , Mice , Likelihood Functions , Phylogeography , Mutation , Phylogeny

18.

Treatments for undefined log ratios in matching analyses.

Caron, Pier-Olivier.

J Exp Anal Behav ; 122(1): 52-61, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38837760

ABSTRACT

A challenge in carrying out matching analyses is to deal with undefined log ratios. If any reinforcer or response rate equals zero, the logarithm of the ratio is undefined: data are unsuitable for analyses. There have been some tentative solutions, but they had not been thoroughly investigated. The purpose of this article is to assess the adequacy of five treatments: omit undefined ratios, use full information maximum likelihood, replace undefined ratios by the mean divided by 100, replace them by a constant 1/10, and add the constant .50 to ratios. Based on simulations, the treatments are compared on their estimations of variance accounted for, sensitivity, and bias. The results show that full information maximum likelihood and omiting undefined ratios had the best overall performance, with negligibly biased and more accurate estimates than mean divided by 100, constant 1/10, and constant .50. The study suggests that mean divided by 100, constant 1/10, and constant .50 should be avoided and recommends full information maximum likelihood to deal with undefined log ratios in matching analyses.

Subject(s)

Reinforcement, Psychology , Likelihood Functions , Animals , Data Interpretation, Statistical , Conditioning, Operant , Computer Simulation , Humans , Reinforcement Schedule

19.

Measures of fragmentation of rest activity patterns: mathematical properties and interpretability based on accelerometer real life data.

Danilevicz, Ian Meneghel; van Hees, Vincent Theodoor; van der Heide, Frank C T; Jacob, Louis; Landré, Benjamin; Benadjaoud, Mohamed Amine; Sabia, Séverine.

BMC Med Res Methodol ; 24(1): 132, 2024 Jun 07.

Article in English | MEDLINE | ID: mdl-38849718

ABSTRACT

Accelerometers, devices that measure body movements, have become valuable tools for studying the fragmentation of rest-activity patterns, a core circadian rhythm dimension, using metrics such as inter-daily stability (IS), intradaily variability (IV), transition probability (TP), and self-similarity parameter (named α ). However, their use remains mainly empirical. Therefore, we investigated the mathematical properties and interpretability of rest-activity fragmentation metrics by providing mathematical proofs for the ranges of IS and IV, proposing maximum likelihood and Bayesian estimators for TP, introducing the activity balance index (ABI) metric, a transformation of α , and describing distributions of these metrics in real-life setting. Analysis of accelerometer data from 2,859 individuals (age=60-83 years, 21.1% women) from the Whitehall II cohort (UK) shows modest correlations between the metrics, except for ABI and α . Sociodemographic (age, sex, education, employment status) and clinical (body mass index (BMI), and number of morbidities) factors were associated with these metrics, with differences observed according to metrics. For example, a difference of 5 units in BMI was associated with all metrics (differences ranging between -0.261 (95% CI -0.302, -0.220) to 0.228 (0.18, 0.268) for standardised TP rest to activity during the awake period and TP activity to rest during the awake period, respectively). These results reinforce the value of these rest-activity fragmentation metrics in epidemiological and clinical studies to examine their role for health. This paper expands on a set of methods that have previously demonstrated empirical value, improves the theoretical foundation for these methods, and evaluates their empirical use in a large dataset.

Subject(s)

Accelerometry , Rest , Humans , Female , Aged , Male , Accelerometry/methods , Accelerometry/statistics & numerical data , Middle Aged , Rest/physiology , Aged, 80 and over , Bayes Theorem , Body Mass Index , Circadian Rhythm/physiology , Likelihood Functions , Motor Activity/physiology

20.

Penalized Regression Methods With Modified Cross-Validation and Bootstrap Tuning Produce Better Prediction Models.

Pavlou, Menelaos; Omar, Rumana Z; Ambler, Gareth.

Biom J ; 66(5): e202300245, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38922968

ABSTRACT

Risk prediction models fitted using maximum likelihood estimation (MLE) are often overfitted resulting in predictions that are too extreme and a calibration slope (CS) less than 1. Penalized methods, such as Ridge and Lasso, have been suggested as a solution to this problem as they tend to shrink regression coefficients toward zero, resulting in predictions closer to the average. The amount of shrinkage is regulated by a tuning parameter, λ , $\lambda ,$ commonly selected via cross-validation ("standard tuning"). Though penalized methods have been found to improve calibration on average, they often over-shrink and exhibit large variability in the selected λ $\lambda $ and hence the CS. This is a problem, particularly for small sample sizes, but also when using sample sizes recommended to control overfitting. We consider whether these problems are partly due to selecting λ $\lambda $ using cross-validation with "training" datasets of reduced size compared to the original development sample, resulting in an over-estimation of λ $\lambda $ and, hence, excessive shrinkage. We propose a modified cross-validation tuning method ("modified tuning"), which estimates λ $\lambda $ from a pseudo-development dataset obtained via bootstrapping from the original dataset, albeit of larger size, such that the resulting cross-validation training datasets are of the same size as the original dataset. Modified tuning can be easily implemented in standard software and is closely related to bootstrap selection of the tuning parameter ("bootstrap tuning"). We evaluated modified and bootstrap tuning for Ridge and Lasso in simulated and real data using recommended sample sizes, and sizes slightly lower and higher. They substantially improved the selection of λ $\lambda $ , resulting in improved CS compared to the standard tuning method. They also improved predictions compared to MLE.

Subject(s)

Biometry , Models, Statistical , Biometry/methods , Regression Analysis , Humans , Likelihood Functions

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL