Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24.112
Filtrar
1.
Bull Math Biol ; 86(9): 106, 2024 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-38995457

RESUMO

Maximum likelihood estimation is among the most widely-used methods for inferring phylogenetic trees from sequence data. This paper solves the problem of computing solutions to the maximum likelihood problem for 3-leaf trees under the 2-state symmetric mutation model (CFN model). Our main result is a closed-form solution to the maximum likelihood problem for unrooted 3-leaf trees, given generic data; this result characterizes all of the ways that a maximum likelihood estimate can fail to exist for generic data and provides theoretical validation for predictions made in Parks and Goldman (Syst Biol 63(5):798-811, 2014). Our proof makes use of both classical tools for studying group-based phylogenetic models such as Hadamard conjugation and reparameterization in terms of Fourier coordinates, as well as more recent results concerning the semi-algebraic constraints of the CFN model. To be able to put these into practice, we also give a complete characterization to test genericity.


Assuntos
Conceitos Matemáticos , Modelos Genéticos , Mutação , Filogenia , Funções Verossimilhança , Algoritmos
2.
Sci Rep ; 14(1): 15743, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38977791

RESUMO

Hierarchical models are common for ecological analysis, but determining appropriate model selection methods remains an ongoing challenge. To confront this challenge, a suitable method is needed to evaluate and compare available candidate models. We compared performance of conditional WAIC, a joint-likelihood approach to WAIC (WAICj), and posterior-predictive loss for selecting between candidate N-mixture models. We tested these model selection criteria on simulated single-season N-mixture models, simulated multi-season N-mixture models with temporal auto-correlation, and three case studies of single-season N-mixture models based on eBird data. WAICj proved more accurate than the standard conditional formulation or posterior-predictive loss, even when models were temporally correlated, suggesting WAICj is a robust alternative to model selection for N-mixture models.


Assuntos
Modelos Estatísticos , Funções Verossimilhança , Simulação por Computador , Estações do Ano , Animais
3.
Nat Commun ; 15(1): 6072, 2024 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-39025905

RESUMO

Mendelian randomization (MR) uses genetic variants as instrumental variables (IVs) to investigate causal relationships between traits. Unlike conventional MR, cis-MR focuses on a single genomic region using only cis-SNPs. For example, using cis-pQTLs for a protein as exposure for a disease opens a cost-effective path for drug target discovery. However, few methods effectively handle pleiotropy and linkage disequilibrium (LD) of cis-SNPs. Here, we propose cisMR-cML, a method based on constrained maximum likelihood, robust to IV assumption violations with strong theoretical support. We further clarify the severe but largely neglected consequences of the current practice of modeling marginal, instead of conditional genetic effects, and only using exposure-associated SNPs in cis-MR analysis. Numerical studies demonstrated our method's superiority over other existing methods. In a drug-target analysis for coronary artery disease (CAD), including a proteome-wide application, we identified three potential drug targets, PCSK9, COLEC11 and FGFR1 for CAD.


Assuntos
Descoberta de Drogas , Desequilíbrio de Ligação , Análise da Randomização Mendeliana , Polimorfismo de Nucleotídeo Único , Humanos , Descoberta de Drogas/métodos , Doença da Artéria Coronariana/genética , Doença da Artéria Coronariana/tratamento farmacológico , Pró-Proteína Convertase 9/genética , Pró-Proteína Convertase 9/metabolismo , Pleiotropia Genética , Estudo de Associação Genômica Ampla/métodos , Locos de Características Quantitativas , Funções Verossimilhança
4.
Biometrics ; 80(3)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-39036984

RESUMO

Recently, it has become common for applied works to combine commonly used survival analysis modeling methods, such as the multivariable Cox model and propensity score weighting, with the intention of forming a doubly robust estimator of an exposure effect hazard ratio that is unbiased in large samples when either the Cox model or the propensity score model is correctly specified. This combination does not, in general, produce a doubly robust estimator, even after regression standardization, when there is truly a causal effect. We demonstrate via simulation this lack of double robustness for the semiparametric Cox model, the Weibull proportional hazards model, and a simple proportional hazards flexible parametric model, with both the latter models fit via maximum likelihood. We provide a novel proof that the combination of propensity score weighting and a proportional hazards survival model, fit either via full or partial likelihood, is consistent under the null of no causal effect of the exposure on the outcome under particular censoring mechanisms if either the propensity score or the outcome model is correctly specified and contains all confounders. Given our results suggesting that double robustness only exists under the null, we outline 2 simple alternative estimators that are doubly robust for the survival difference at a given time point (in the above sense), provided the censoring mechanism can be correctly modeled, and one doubly robust method of estimation for the full survival curve. We provide R code to use these estimators for estimation and inference in the supporting information.


Assuntos
Simulação por Computador , Pontuação de Propensão , Modelos de Riscos Proporcionais , Humanos , Análise de Sobrevida , Funções Verossimilhança , Biometria/métodos
5.
J Exp Anal Behav ; 122(1): 52-61, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38837760

RESUMO

A challenge in carrying out matching analyses is to deal with undefined log ratios. If any reinforcer or response rate equals zero, the logarithm of the ratio is undefined: data are unsuitable for analyses. There have been some tentative solutions, but they had not been thoroughly investigated. The purpose of this article is to assess the adequacy of five treatments: omit undefined ratios, use full information maximum likelihood, replace undefined ratios by the mean divided by 100, replace them by a constant 1/10, and add the constant .50 to ratios. Based on simulations, the treatments are compared on their estimations of variance accounted for, sensitivity, and bias. The results show that full information maximum likelihood and omiting undefined ratios had the best overall performance, with negligibly biased and more accurate estimates than mean divided by 100, constant 1/10, and constant .50. The study suggests that mean divided by 100, constant 1/10, and constant .50 should be avoided and recommends full information maximum likelihood to deal with undefined log ratios in matching analyses.


Assuntos
Reforço Psicológico , Funções Verossimilhança , Animais , Interpretação Estatística de Dados , Condicionamento Operante , Simulação por Computador , Humanos , Esquema de Reforço
6.
BMC Med Res Methodol ; 24(1): 140, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38943068

RESUMO

BACKGROUND: Longitudinal ordinal data are commonly analyzed using a marginal proportional odds model for relating ordinal outcomes to covariates in the biomedical and health sciences. The generalized estimating equation (GEE) consistently estimates the regression parameters of marginal models even if the working covariance structure is misspecified. For small-sample longitudinal binary data, recent studies have shown that the bias of regression parameters may result from the GEE and have addressed the issue by applying Firth's adjustment for the likelihood score equation to the GEE as if generalized estimating functions were likelihood score functions. In this manuscript, for the proportional odds model for longitudinal ordinal data, the small-sample properties of the GEE were investigated, and a bias-reduced GEE (BR-GEE) was derived. METHODS: By applying the adjusted function originally derived for the likelihood score function of the proportional odds model to the GEE, we produced the BR-GEE. We investigated the small-sample properties of both GEE and BR-GEE through simulation and applied them to a clinical study dataset. RESULTS: In simulation studies, the BR-GEE had a bias closer to zero, smaller root mean square error than the GEE with coverage probability of confidence interval near or above the nominal level. The simulation also showed that BR-GEE maintained a type I error rate near or below the nominal level. CONCLUSIONS: For the analysis of longitudinal ordinal data involving a small number of subjects, the BR-GEE is advantageous for obtaining estimates of the regression parameters of marginal proportional odds models.


Assuntos
Viés , Humanos , Estudos Longitudinais , Funções Verossimilhança , Simulação por Computador , Modelos Estatísticos , Interpretação Estatística de Dados , Tamanho da Amostra , Algoritmos
7.
J Forensic Sci ; 69(4): 1125-1137, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38853374

RESUMO

The subject of inter- and intra-laboratory inconsistency was recently raised in a commentary by Itiel Dror. We re-visit an inter-laboratory trial, with which some of the authors of this current discussion were associated, to diagnose the causes of any differences in the likelihood ratios (LRs) assigned using probabilistic genotyping software. Some of the variation was due to different decisions that would be made on a case-by-case basis, some due to laboratory policy and would hence differ between laboratories, and the final and smallest part was the run-to-run difference caused by the Monte Carlo aspect of the software used. However, the net variation in LRs was considerable. We believe that most laboratories will self-diagnose the cause of their difference from the majority answer and in some, but not all instances will take corrective action. An inter-laboratory exercise consisting of raw data files for relatively straightforward mixtures, such as two mixtures of three or four persons, would allow laboratories to calibrate their procedures and findings.


Assuntos
Software , Humanos , Funções Verossimilhança , Método de Monte Carlo , Impressões Digitais de DNA , Genótipo , Laboratórios/normas , Tomada de Decisões , Genética Forense/métodos
8.
Mol Biol Evol ; 41(6)2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38829798

RESUMO

The computational search for the maximum-likelihood phylogenetic tree is an NP-hard problem. As such, current tree search algorithms might result in a tree that is the local optima, not the global one. Here, we introduce a paradigm shift for predicting the maximum-likelihood tree, by approximating long-term gains of likelihood rather than maximizing likelihood gain at each step of the search. Our proposed approach harnesses the power of reinforcement learning to learn an optimal search strategy, aiming at the global optimum of the search space. We show that when analyzing empirical data containing dozens of sequences, the log-likelihood improvement from the starting tree obtained by the reinforcement learning-based agent was 0.969 or higher compared to that achieved by current state-of-the-art techniques. Notably, this performance is attained without the need to perform costly likelihood optimizations apart from the training process, thus potentially allowing for an exponential increase in runtime. We exemplify this for data sets containing 15 sequences of length 18,000 bp and demonstrate that the reinforcement learning-based method is roughly three times faster than the state-of-the-art software. This study illustrates the potential of reinforcement learning in addressing the challenges of phylogenetic tree reconstruction.


Assuntos
Algoritmos , Filogenia , Funções Verossimilhança , Modelos Genéticos , Biologia Computacional/métodos , Software
9.
Bioinformatics ; 40(Supplement_1): i208-i217, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940166

RESUMO

MOTIVATION: Currently used methods for estimating branch support in phylogenetic analyses often rely on the classic Felsenstein's bootstrap, parametric tests, or their approximations. As these branch support scores are widely used in phylogenetic analyses, having accurate, fast, and interpretable scores is of high importance. RESULTS: Here, we employed a data-driven approach to estimate branch support values with a probabilistic interpretation. To this end, we simulated thousands of realistic phylogenetic trees and the corresponding multiple sequence alignments. Each of the obtained alignments was used to infer the phylogeny using state-of-the-art phylogenetic inference software, which was then compared to the true tree. Using these extensive data, we trained machine-learning algorithms to estimate branch support values for each bipartition within the maximum-likelihood trees obtained by each software. Our results demonstrate that our model provides fast and more accurate probability-based branch support values than commonly used procedures. We demonstrate the applicability of our approach on empirical datasets. AVAILABILITY AND IMPLEMENTATION: The data supporting this work are available in the Figshare repository at https://doi.org/10.6084/m9.figshare.25050554.v1, and the underlying code is accessible via GitHub at https://github.com/noaeker/bootstrap_repo.


Assuntos
Algoritmos , Aprendizado de Máquina , Filogenia , Software , Alinhamento de Sequência/métodos , Biologia Computacional/métodos , Funções Verossimilhança
10.
Bioinformatics ; 40(Supplement_1): i228-i236, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940146

RESUMO

MOTIVATION: Recently developed spatial lineage tracing technologies induce somatic mutations at specific genomic loci in a population of growing cells and then measure these mutations in the sampled cells along with the physical locations of the cells. These technologies enable high-throughput studies of developmental processes over space and time. However, these applications rely on accurate reconstruction of a spatial cell lineage tree describing both past cell divisions and cell locations. Spatial lineage trees are related to phylogeographic models that have been well-studied in the phylogenetics literature. We demonstrate that standard phylogeographic models based on Brownian motion are inadequate to describe the spatial symmetric displacement (SD) of cells during cell division. RESULTS: We introduce a new model-the SD model for cell motility that includes symmetric displacements of daughter cells from the parental cell followed by independent diffusion of daughter cells. We show that this model more accurately describes the locations of cells in a real spatial lineage tracing of mouse embryonic stem cells. Combining the spatial SD model with an evolutionary model of DNA mutations, we obtain a phylogeographic model for spatial lineage tracing. Using this model, we devise a maximum likelihood framework-MOLLUSC (Maximum Likelihood Estimation Of Lineage and Location Using Single-Cell Spatial Lineage tracing Data)-to co-estimate time-resolved branch lengths, spatial diffusion rate, and mutation rate. On both simulated and real data, we show that MOLLUSC accurately estimates all parameters. In contrast, the Brownian motion model overestimates spatial diffusion rate in all test cases. In addition, the inclusion of spatial information improves accuracy of branch length estimation compared to sequence data alone. On real data, we show that spatial information has more signal than sequence data for branch length estimation, suggesting augmenting lineage tracing technologies with spatial information is useful to overcome the limitations of genome-editing in developmental systems. AVAILABILITY AND IMPLEMENTATION: The python implementation of MOLLUSC is available at https://github.com/raphael-group/MOLLUSC.


Assuntos
Divisão Celular , Linhagem da Célula , Movimento Celular , Animais , Camundongos , Funções Verossimilhança , Filogeografia , Mutação , Filogenia
11.
J Phys Chem B ; 128(23): 5576-5589, 2024 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-38833567

RESUMO

Single-molecule free diffusion experiments enable accurate quantification of coexisting species or states. However, unequal brightness and diffusivity introduce a burst selection bias and affect the interpretation of experimental results. We address this issue with a photon-by-photon maximum likelihood method, burstML, which explicitly considers burst selection criteria. BurstML accurately estimates parameters, including photon count rates, diffusion times, Förster resonance energy transfer (FRET) efficiencies, and population, even in cases where species are poorly distinguished in FRET efficiency histograms. We develop a quantitative theory that determines the fraction of photon bursts corresponding to each species and thus obtain accurate species populations from the measured burst fractions. In addition, we provide a simple approximate formula for burst fractions and establish the range of parameters where unequal brightness and diffusivity can significantly affect the results obtained by conventional methods. The performance of the burstML method is compared with that of a maximum likelihood method that assumes equal species brightness and diffusivity, as well as standard Gaussian fitting of FRET efficiency histograms, using both simulated and real single-molecule data for cold-shock protein, protein L, and protein G. The burstML method enhances the accuracy of parameter estimation in single-molecule fluorescence studies.


Assuntos
Transferência Ressonante de Energia de Fluorescência , Difusão , Fótons , Funções Verossimilhança , Imagem Individual de Molécula/métodos
12.
Sci Rep ; 14(1): 13392, 2024 06 11.
Artigo em Inglês | MEDLINE | ID: mdl-38862579

RESUMO

Cefepime and piperacillin/tazobactam are antimicrobials recommended by IDSA/ATS guidelines for the empirical management of patients admitted to the intensive care unit (ICU) with community-acquired pneumonia (CAP). Concerns have been raised about which should be used in clinical practice. This study aims to compare the effect of cefepime and piperacillin/tazobactam in critically ill CAP patients through a targeted maximum likelihood estimation (TMLE). A total of 2026 ICU-admitted patients with CAP were included. Among them, (47%) presented respiratory failure, and (27%) developed septic shock. A total of (68%) received cefepime and (32%) piperacillin/tazobactam-based treatment. After running the TMLE, we found that cefepime and piperacillin/tazobactam-based treatments have comparable 28-day, hospital, and ICU mortality. Additionally, age, PTT, serum potassium and temperature were associated with preferring cefepime over piperacillin/tazobactam (OR 1.14 95% CI [1.01-1.27], p = 0.03), (OR 1.14 95% CI [1.03-1.26], p = 0.009), (OR 1.1 95% CI [1.01-1.22], p = 0.039) and (OR 1.13 95% CI [1.03-1.24], p = 0.014)]. Our study found a similar mortality rate among ICU-admitted CAP patients treated with cefepime and piperacillin/tazobactam. Clinicians may consider factors such as availability and safety profiles when making treatment decisions.


Assuntos
Antibacterianos , Cefepima , Infecções Comunitárias Adquiridas , Estado Terminal , Unidades de Terapia Intensiva , Combinação Piperacilina e Tazobactam , Humanos , Cefepima/uso terapêutico , Cefepima/administração & dosagem , Infecções Comunitárias Adquiridas/tratamento farmacológico , Infecções Comunitárias Adquiridas/mortalidade , Combinação Piperacilina e Tazobactam/uso terapêutico , Masculino , Feminino , Idoso , Pessoa de Meia-Idade , Antibacterianos/uso terapêutico , Funções Verossimilhança , Pneumonia/tratamento farmacológico , Pneumonia/mortalidade , Piperacilina/uso terapêutico
13.
Bull Math Biol ; 86(7): 85, 2024 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-38853189

RESUMO

How viral infections develop can change based on the number of viruses initially entering the body. The understanding of the impacts of infection doses remains incomplete, in part due to challenging constraints, and a lack of research. Gaining more insights is crucial regarding the measles virus (MV). The higher the MV infection dose, the earlier the peak of acute viremia, but the magnitude of the peak viremia remains almost constant. Measles is highly contagious, causes immunosuppression such as lymphopenia, and contributes substantially to childhood morbidity and mortality. This work investigated mechanisms underlying the observed wild-type measles infection dose responses in cynomolgus monkeys. We fitted longitudinal data on viremia using maximum likelihood estimation, and used the Akaike Information Criterion (AIC) to evaluate relevant biological hypotheses and their respective model parameterizations. The lowest AIC indicates a linear relationship between the infection dose, the initial viral load, and the initial number of activated MV-specific T cells. Early peak viremia is associated with high initial number of activated MV-specific T cells. Thus, when MV infection dose increases, the initial viremia and associated immune cell stimulation increase, and reduce the time it takes for T cell killing to be sufficient, thereby allowing dose-independent peaks for viremia, MV-specific T cells, and lymphocyte depletion. Together, these results suggest that the development of measles depends on virus-host interactions at the start and the efficiency of viral control by cellular immunity. These relationships are additional motivations for prevention, vaccination, and early treatment for measles.


Assuntos
Macaca fascicularis , Conceitos Matemáticos , Vírus do Sarampo , Sarampo , Carga Viral , Viremia , Sarampo/imunologia , Sarampo/transmissão , Sarampo/prevenção & controle , Sarampo/virologia , Sarampo/epidemiologia , Animais , Viremia/imunologia , Viremia/virologia , Vírus do Sarampo/imunologia , Vírus do Sarampo/patogenicidade , Vírus do Sarampo/fisiologia , Funções Verossimilhança , Humanos , Modelos Imunológicos , Modelos Biológicos , Linfócitos T/imunologia , Ativação Linfocitária
14.
Stat Med ; 43(17): 3326-3352, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-38837431

RESUMO

Stepped wedge trials (SWTs) are a type of cluster randomized trial that involve repeated measures on clusters and design-induced confounding between time and treatment. Although mixed models are commonly used to analyze SWTs, they are susceptible to misspecification particularly for cluster-longitudinal designs such as SWTs. Mixed model estimation leverages both "horizontal" or within-cluster information and "vertical" or between-cluster information. To use horizontal information in a mixed model, both the mean model and correlation structure must be correctly specified or accounted for, since time is confounded with treatment and measurements are likely correlated within clusters. Alternative non-parametric methods have been proposed that use only vertical information; these are more robust because between-cluster comparisons in a SWT preserve randomization, but these non-parametric methods are not very efficient. We propose a composite likelihood method that focuses on vertical information, but has the flexibility to recover efficiency by using additional horizontal information. We compare the properties and performance of various methods, using simulations based on COVID-19 data and a demonstration of application to the LIRE trial. We found that a vertical composite likelihood model that leverages baseline data is more robust than traditional methods, and more efficient than methods that use only vertical information. We hope that these results demonstrate the potential value of model-based vertical methods for SWTs with a large number of clusters, and that these new tools are useful to researchers who are concerned about misspecification of traditional models.


Assuntos
Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Funções Verossimilhança , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Análise por Conglomerados , Simulação por Computador , Modelos Estatísticos , COVID-19 , Projetos de Pesquisa
15.
BMC Med Res Methodol ; 24(1): 132, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38849718

RESUMO

Accelerometers, devices that measure body movements, have become valuable tools for studying the fragmentation of rest-activity patterns, a core circadian rhythm dimension, using metrics such as inter-daily stability (IS), intradaily variability (IV), transition probability (TP), and self-similarity parameter (named α ). However, their use remains mainly empirical. Therefore, we investigated the mathematical properties and interpretability of rest-activity fragmentation metrics by providing mathematical proofs for the ranges of IS and IV, proposing maximum likelihood and Bayesian estimators for TP, introducing the activity balance index (ABI) metric, a transformation of α , and describing distributions of these metrics in real-life setting. Analysis of accelerometer data from 2,859 individuals (age=60-83 years, 21.1% women) from the Whitehall II cohort (UK) shows modest correlations between the metrics, except for ABI and α . Sociodemographic (age, sex, education, employment status) and clinical (body mass index (BMI), and number of morbidities) factors were associated with these metrics, with differences observed according to metrics. For example, a difference of 5 units in BMI was associated with all metrics (differences ranging between -0.261 (95% CI -0.302, -0.220) to 0.228 (0.18, 0.268) for standardised TP rest to activity during the awake period and TP activity to rest during the awake period, respectively). These results reinforce the value of these rest-activity fragmentation metrics in epidemiological and clinical studies to examine their role for health. This paper expands on a set of methods that have previously demonstrated empirical value, improves the theoretical foundation for these methods, and evaluates their empirical use in a large dataset.


Assuntos
Acelerometria , Descanso , Humanos , Feminino , Idoso , Masculino , Acelerometria/métodos , Acelerometria/estatística & dados numéricos , Pessoa de Meia-Idade , Descanso/fisiologia , Idoso de 80 Anos ou mais , Teorema de Bayes , Índice de Massa Corporal , Ritmo Circadiano/fisiologia , Funções Verossimilhança , Atividade Motora/fisiologia
16.
Mol Biol Evol ; 41(7)2024 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-38934791

RESUMO

We have recently introduced MAPLE (MAximum Parsimonious Likelihood Estimation), a new pandemic-scale phylogenetic inference method exclusively designed for genomic epidemiology. In response to the need for enhancing MAPLE's performance and scalability, here we present two key components: (i) CMAPLE software, a highly optimized C++ reimplementation of MAPLE with many new features and advancements, and (ii) CMAPLE library, a suite of application programming interfaces to facilitate the integration of the CMAPLE algorithm into existing phylogenetic inference packages. Notably, we have successfully integrated CMAPLE into the widely used IQ-TREE 2 software, enabling its rapid adoption in the scientific community. These advancements serve as a vital step toward better preparedness for future pandemics, offering researchers powerful tools for large-scale pathogen genomic analysis.


Assuntos
Filogenia , Software , Algoritmos , Pandemias , Funções Verossimilhança , Humanos
17.
Stat Med ; 43(19): 3723-3741, 2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-38890118

RESUMO

We consider the Bayesian estimation of the parameters of a finite mixture model from independent order statistics arising from imperfect ranked set sampling designs. As a cost-effective method, ranked set sampling enables us to incorporate easily attainable characteristics, as ranking information, into data collection and Bayesian estimation. To handle the special structure of the ranked set samples, we develop a Bayesian estimation approach exploiting the Expectation-Maximization (EM) algorithm in estimating the ranking parameters and Metropolis within Gibbs Sampling to estimate the parameters of the underlying mixture model. Our findings show that the proposed RSS-based Bayesian estimation method outperforms the commonly used Bayesian counterpart using simple random sampling. The developed method is finally applied to estimate the bone disorder status of women aged 50 and older.


Assuntos
Algoritmos , Teorema de Bayes , Modelos Estatísticos , Humanos , Feminino , Pessoa de Meia-Idade , Idoso , Simulação por Computador , Método de Monte Carlo , Funções Verossimilhança , Cadeias de Markov
18.
PLoS One ; 19(6): e0302098, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38870135

RESUMO

Suitable combinations of observed datasets for estimating crop model parameters can reduce the computational cost while ensuring accuracy. This study aims to explore the quantitative influence of different combinations of the observed phenological stages on estimation of cultivar-specific parameters (CPSs). We used the CROPGRO-Soybean phenological model (CSPM) as a case study in combination with the Generalized Likelihood Uncertainty Estimation (GLUE) method. Different combinations of four observed phenological stages, including initial flowering, initial pod, initial grain, and initial maturity stages for five soybean cultivars from Exp. 1 and Exp. 3 described in Table 2 are respectively used to calibrate the CSPs. The CSPM, driven by the optimized CSPs, is then evaluated against two independent phenological datasets from Exp. 2 and Exp. 4 described in Table 2. Root means square error (RMSE) (mean absolute error (MAE), coefficient of determination (R2), and Nash Sutcliffe model efficiency (NSE)) are 15.50 (14.63, 0.96, 0.42), 4.76 (3.92, 0.97, 0.95), 4.69 (3.72, 0.98, 0.95), 3.91 (3.40, 0.99, 0.96) and 12.54 (11.67, 0.95, 0.60), 5.07 (4.61, 0.98, 0.93), 4.97 (4.28, 0.97, 0.94), 4.58 (4.02, 0.98, 0.95) for using one, two, three, and four observed phenological stages in the CSPs estimation. The evaluation results suggest that RMSE and MAE decrease, and R2 and NSE increase with the increase in the number of observed phenological stages used for parameter calibration. However, there is no significant reduction in the RMSEs (MAEs, NSEs) using two, three, and four observed stages. Relatively reliable optimized CSPs for CSMP are obtained by using at least two observed phenological stages balancing calibration effect and computational cost. These findings provide new insight into parameter estimation of crop models.


Assuntos
Produtos Agrícolas , Glycine max , Glycine max/crescimento & desenvolvimento , Produtos Agrícolas/crescimento & desenvolvimento , Calibragem , Modelos Biológicos , Funções Verossimilhança , Incerteza
19.
Biom J ; 66(5): e202300245, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38922968

RESUMO

Risk prediction models fitted using maximum likelihood estimation (MLE) are often overfitted resulting in predictions that are too extreme and a calibration slope (CS) less than 1. Penalized methods, such as Ridge and Lasso, have been suggested as a solution to this problem as they tend to shrink regression coefficients toward zero, resulting in predictions closer to the average. The amount of shrinkage is regulated by a tuning parameter, λ , $\lambda ,$ commonly selected via cross-validation ("standard tuning"). Though penalized methods have been found to improve calibration on average, they often over-shrink and exhibit large variability in the selected λ $\lambda $ and hence the CS. This is a problem, particularly for small sample sizes, but also when using sample sizes recommended to control overfitting. We consider whether these problems are partly due to selecting λ $\lambda $ using cross-validation with "training" datasets of reduced size compared to the original development sample, resulting in an over-estimation of λ $\lambda $ and, hence, excessive shrinkage. We propose a modified cross-validation tuning method ("modified tuning"), which estimates λ $\lambda $ from a pseudo-development dataset obtained via bootstrapping from the original dataset, albeit of larger size, such that the resulting cross-validation training datasets are of the same size as the original dataset. Modified tuning can be easily implemented in standard software and is closely related to bootstrap selection of the tuning parameter ("bootstrap tuning"). We evaluated modified and bootstrap tuning for Ridge and Lasso in simulated and real data using recommended sample sizes, and sizes slightly lower and higher. They substantially improved the selection of λ $\lambda $ , resulting in improved CS compared to the standard tuning method. They also improved predictions compared to MLE.


Assuntos
Biometria , Modelos Estatísticos , Biometria/métodos , Análise de Regressão , Humanos , Funções Verossimilhança
20.
Forensic Sci Int Genet ; 71: 103057, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38733649

RESUMO

In recent years, probabilistic genotyping software has been adapted for the analysis of massively parallel sequencing (MPS) forensic data. Likelihood ratios (LR) are based on allele frequencies selected from populations of interest. This study provides an outline of sequence-based (SB) allele frequencies for autosomal short tandem repeats (aSTRs) and identity single nucleotide polymorphisms (iSNPs) in 371 individuals from Southern Norway. 27 aSTRs and 94 iSNPs were previously analysed with the ForenSeq™ DNA Signature Prep Kit (Verogen). The number of alleles with frequencies less than 0.05 for sequenced-based alleles was 4.6 times higher than for length-based alleles. Consistent with previous studies, it was observed that sequence-based data (both with and without flanks) exhibited higher allele diversity compared to length-based (LB) data; random match probabilities were lower for SB alleles confirming their advantage to discriminate between individuals. Two alleles in markers D22S1045 and Penta D were observed with SNPs in the 3´ flanking region, which have not been reported before. Also, a novel SNP with a minor allele frequency (MAF) of 0.001, was found in marker TH01. The impact of the sample size on minor allele frequency (MAF) values was studied in 88 iSNPs from Southern Norway (n = 371). The findings were then compared to a larger Norwegian population dataset (n = 15,769). The results showed that the smaller Southern Norway dataset provided similar results, and it was a representative sample. Population structure was analyzed for regions within Southern Norway; FST estimates for aSTR and iSNPs did not indicate any genetic structure. Finally, we investigated the genetic differences between Southern Norway and two other populations: Northern Norway and Denmark. Allele frequencies between these populations were compared, and we found no significant frequency differences (p-values > 0.0001). We also calculated the pairwise FST values per marker and comparisons between Southern and Northern Norway showed small differences. In contrast, the comparisons between Southern Norway and Denmark showed higher FST values for some markers, possibly driven by distinct alleles that were present in only one of the populations. In summary, we propose that allele frequencies from each population considered in this study could be used interchangeably to calculate genotype probabilities.


Assuntos
Impressões Digitais de DNA , Frequência do Gene , Genética Populacional , Sequenciamento de Nucleotídeos em Larga Escala , Repetições de Microssatélites , Polimorfismo de Nucleotídeo Único , Humanos , Noruega , Análise de Sequência de DNA , Funções Verossimilhança , Genótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA