Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
Entropy (Basel) ; 26(6)2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38920443

RESUMO

The road passenger transportation enterprise is a complex system, requiring a clear understanding of their active safety situation (ASS), trends, and influencing factors. This facilitates transportation authorities to promptly receive signals and take effective measures. Through exploratory factor analysis and confirmatory factor analysis, we delved into potential factors for evaluating ASS and extracted an ASS index. To predict obtaining a higher ASS information rate, we compared multiple time series models, including GRU (gated recurrent unit), LSTM (long short-term memory), ARIMA, Prophet, Conv_LSTM, and TCN (temporal convolutional network). This paper proposed the WDA-DBN (water drop algorithm-Deep Belief Network) model and employed DEEPSHAP to identify factors with higher ASS information content. TCN and GRU performed well in the prediction. Compared to the other models, WDA-DBN exhibited the best performance in terms of MSE and MAE. Overall, deep learning models outperform econometric models in terms of information processing. The total time spent processing alarms positively influences ASS, while variables such as fatigue driving occurrences, abnormal driving occurrences, and nighttime driving alarm occurrences have a negative impact on ASS.

2.
Am Nat ; 199(1): 108-125, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34978965

RESUMO

AbstractEfforts to explain animal population cycles often invoke consumer-resource theory, which has shown that consumer-resource interactions alone can drive population cycles. Eco-evo theory instead argues that population cycles are partly driven by fluctuating selection for resistance in the resource, but support for eco-evo theory has come almost entirely from laboratory microcosms. Here we ask, Can eco-evo theory explain population cycles in the field? We compared the ability of eco-evo models and classical "eco-only" models to explain data on cycles in the insect Lymantria dispar, in which outbreaks of the insect are terminated by a fatal baculovirus. We carried out a statistical comparison of the ability of eco-only and eco-evo models to explain combined data from L. dispar outbreak cycles and baculovirus epizootics (epidemics in animals). Both models require high host variation in resistance to explain the epizootic data, but high host variation in the eco-evo model leads to consistently accurate predictions of outbreak cycles, whereas in the presence of high host variation the eco-only model can explain outbreak cycles only by invoking high levels of stochasticity, which leads to highly variable and often inaccurate predictions of outbreak cycles. Our work provides statistically robust evidence that eco-evo models can explain population cycles in the field.


Assuntos
Mariposas , Animais , Insetos , Dinâmica Populacional
3.
Stat Sci ; 37(4): 494-518, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37168541

RESUMO

Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud computing makes access to supercomputers affordable. Deep learning software libraries make programming statistical algorithms easy and enable users to write code once and run it anywhere-from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. Highlighting how these developments benefit statisticians, we review recent optimization algorithms that are useful for high-dimensional models and can harness the power of HPC. Code snippets are provided to demonstrate the ease of programming. We also provide an easy-to-use distributed matrix data structure suitable for HPC. Employing this data structure, we illustrate various statistical applications including large-scale positron emission tomography and ℓ1-regularized Cox regression. Our examples easily scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, we analyze the onset of type-2 diabetes from the UK Biobank with 200,000 subjects and about 500,000 single nucleotide polymorphisms using the HPC ℓ1-regularized Cox regression. Fitting this half-million-variate model takes less than 45 minutes and reconfirms known associations. To our knowledge, this is the first demonstration of the feasibility of penalized regression of survival outcomes at this scale.

4.
Comput Stat ; : 1-25, 2022 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-35465358

RESUMO

In 2019, members of the Executive Committee of the International Association for Statistical Computing (IASC) were contacted by members of the IASC from Africa asking whether it would be feasible to establish a new regional IASC section in Africa. The establishment of a new regional section requires several steps that are outlined in the IASC Statutes at https://iasc-isi.org/statutes/. The approval likely depends on whether the proposed new regional section has the potential to conduct typical section activities, such as organizing regional conferences, workshops, and short courses. To establish whether it is feasible to add a regional section in Africa, the IASC must know whether there is currently enough high-level activity within African countries with respect to computational statistics. To answer this question, we looked at author affiliations of articles published in the Springer journal Computational Statistics (COST) and the Elsevier journal Computational Statistics & Data Analysis (CSDA) from 2015 to 2020 and used these data as a proxy to compare author productivity for authors with an affiliation in Africa in 2019 and 2020, compared to authors with an affiliation in Latin America in 2015 and 2016. This article looks at quantitative results to the questions above, provides insight on how students from Utah State University's STAT 5080/6080 "Data Technologies" course from the Fall 2019 semester used web scraping techniques in a homework assignment to gather author affiliations from COST and CSDA to answer these questions, and includes the evaluation of student feedback obtained after the end of the course.

5.
J Anim Ecol ; 89(1): 248-267, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31587257

RESUMO

The advent of miniaturized biologging devices has provided ecologists with unprecedented opportunities to record animal movement across scales, and led to the collection of ever-increasing quantities of tracking data. In parallel, sophisticated tools have been developed to process, visualize and analyse tracking data; however, many of these tools have proliferated in isolation, making it challenging for users to select the most appropriate method for the question in hand. Indeed, within the r software alone, we listed 58 packages created to deal with tracking data or 'tracking packages'. Here, we reviewed and described each tracking package based on a workflow centred around tracking data (i.e. spatio-temporal locations (x, y, t)), broken down into three stages: pre-processing, post-processing and analysis, the latter consisting of data visualization, track description, path reconstruction, behavioural pattern identification, space use characterization, trajectory simulation and others. Supporting documentation is key to render a package accessible for users. Based on a user survey, we reviewed the quality of packages' documentation and identified 11 packages with good or excellent documentation. Links between packages were assessed through a network graph analysis. Although a large group of packages showed some degree of connectivity (either depending on functions or suggesting the use of another tracking package), one third of the packages worked in isolation, reflecting a fragmentation in the r movement-ecology programming community. Finally, we provide recommendations for users when choosing packages, and for developers to maximize the usefulness of their contribution and strengthen the links within the programming community.


Assuntos
Movimento , Software , Animais
6.
Stat Med ; 38(18): 3460-3475, 2019 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-31099897

RESUMO

We propose two measures of performance for a confidence interval for a binomial proportion p: the root mean squared error and the mean absolute deviation. We also devise a confidence interval for p based on the actual coverage function that combines several existing approximate confidence intervals. This "Ensemble" confidence interval has improved statistical properties over the constituent confidence intervals. Software in an R package, which can be used in devising and assessing these confidence intervals, is available on CRAN.


Assuntos
Distribuição Binomial , Intervalos de Confiança , Modelos Estatísticos , Algoritmos , Bioestatística , Biologia Computacional , Simulação por Computador , Humanos , Método de Monte Carlo , Software , Estatísticas não Paramétricas
7.
Stat Med ; 37(7): 1059-1085, 2018 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-29315733

RESUMO

Comparative trials that report binary outcome data are commonly pooled in systematic reviews and meta-analyses. This type of data can be presented as a series of 2-by-2 tables. The pooled odds ratio is often presented as the outcome of primary interest in the resulting meta-analysis. We examine the use of 7 models for random-effects meta-analyses that have been proposed for this purpose. The first of these models is the conventional one that uses normal within-study approximations and a 2-stage approach. The other models are generalised linear mixed models that perform the analysis in 1 stage and have the potential to provide more accurate inference. We explore the implications of using these 7 models in the context of a Cochrane Review, and we also perform a simulation study. We conclude that generalised linear mixed models can result in better statistical inference than the conventional 2-stage approach but also that this type of model presents issues and difficulties. These challenges include more demanding numerical methods and determining the best way to model study specific baseline risks. One possible approach for analysts is to specify a primary model prior to performing the systematic review but also to present the results using other models in a sensitivity analysis. Only one of the models that we investigate is found to perform poorly so that any of the other models could be considered for either the primary or the sensitivity analysis.


Assuntos
Modelos Lineares , Modelos Logísticos , Metanálise como Assunto , Razão de Chances , Viés , Simulação por Computador , Humanos
8.
Brain Behav Evol ; 92(1-2): 47-62, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30130751

RESUMO

The majority of holocephalans live in the mesopelagic zone of the deep ocean, where there is little or no sunlight, but some species migrate to brightly lit shallow waters to reproduce. This study compares the retinal morphology of two species of deep-sea chimaeras, the Pacific spookfish (Rhinochimaera pacifica) and the Carpenter's chimaera (Chimaera lignaria), with the elephant shark (Callorhinchus milii), a vertical migrator that lives in the mesopelagic zone but migrates to shallow water to reproduce. The two deep-sea chimaera species possess pure rod retinae with long photoreceptor outer segments that might serve to increase visual sensitivity. In contrast, the retina of the elephant shark possesses rods, with an outer-segment length significantly shorter (a mean of 34 µm) than in the deep-sea species, and cones, and therefore the potential for color vision. The retinal ganglion cell distribution closely follows that of the photoreceptor populations in all three species, but there is a lower peak density of these cells in both deep-sea species (215-275 cells/mm2 vs. 769 cells/mm2 in the elephant shark), which represents a significant increase in the convergence of visual information (summation ratio) from photoreceptors to ganglion cells. It is evident that the eyes of deep-sea chimaeras have increased sensitivity to detect objects under low levels of light, but at the expense of both resolution and the capacity for color vision. In contrast, the elephant shark has a lower sensitivity, but the potential for color discrimination and a higher visual acuity.


Assuntos
Peixes/anatomia & histologia , Peixes/fisiologia , Células Ganglionares da Retina , Células Fotorreceptoras Retinianas Bastonetes , Visão Ocular/fisiologia , Acuidade Visual/fisiologia , Animais , Especificidade da Espécie
9.
Proc Natl Acad Sci U S A ; 112(48): 14788-92, 2015 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-26554005

RESUMO

One of the cornerstones of the R system for statistical computing is the multitude of packages contributed by numerous package authors. This amount of packages makes an extremely broad range of statistical techniques and other quantitative methods freely available. Thus far, no empirical study has investigated psychological factors that drive authors to participate in the R project. This article presents a study of R package authors, collecting data on different types of participation (number of packages, participation in mailing lists, participation in conferences), three psychological scales (types of motivation, psychological values, and work design characteristics), and various socio-demographic factors. The data are analyzed using item response models and subsequent generalized linear models, showing that the most important determinants for participation are a hybrid form of motivation and the social characteristics of the work design. Other factors are found to have less impact or influence only specific aspects of participation.


Assuntos
Comportamento Cooperativo , Computação Matemática , Motivação , Humanos , Modelos Lineares , Psicometria , Análise de Regressão , Inquéritos e Questionários , Trabalho
10.
Biometrics ; 69(4): 893-902, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24117144

RESUMO

Characterization of relationships between time-varying drug exposures and adverse events (AEs) related to health outcomes represents the primary objective in postmarketing drug safety surveillance. Such surveillance increasingly utilizes large-scale longitudinal observational databases (LODs), containing time-stamped patient-level medical information including periods of drug exposure and dates of diagnoses for millions of patients. Statistical methods for LODs must confront computational challenges related to the scale of the data, and must also address confounding and other biases that can undermine efforts to estimate effect sizes. Methods that compare on-drug with off-drug periods within patient offer specific advantages over between patient analysis on both counts. To accomplish these aims, we extend the self-controlled case series (SCCS) for LODs. SCCS implicitly controls for fixed multiplicative baseline covariates since each individual acts as their own control. In addition, only exposed cases are required for the analysis, which is computationally advantageous. The standard SCCS approach is usually used to assess single drugs and therefore estimates marginal associations between individual drugs and particular AEs. Such analyses ignore confounding drugs and interactions and have the potential to give misleading results. In order to avoid these difficulties, we propose a regularized multiple SCCS approach that incorporates potentially thousands or more of time-varying confounders such as other drugs. The approach successfully handles the high dimensionality and can provide a sparse solution via an L1 regularizer. We present details of the model and the associated optimization procedure, as well as results of empirical investigations.


Assuntos
Estudos de Casos e Controles , Interpretação Estatística de Dados , Bases de Dados Factuais , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Estudos Longitudinais , Estudos Observacionais como Assunto , Vigilância da População/métodos , Humanos , Incidência , Medição de Risco
11.
iScience ; 26(7): 107023, 2023 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-37534153

RESUMO

Maternal colonization by Group B Streptococcus (GBS) can lead to severe infection in neonates and has also been associated with prematurity and stillbirth. Better quantitative understanding of the trajectories of GBS carriage during pregnancy is essential for the design of informative epidemiological studies. Here, we describe analyses of published longitudinal data using Bayesian hidden Markov models, which involve the estimation of parameters related to the succession of latent states (infection status) and observations (culture positivity). In addition to quantifying infection acquisition and clearance probabilities, the statistical approach also suggests that for some longitudinal patterns of culture results, pregnant women were likely to have been GBS-colonized despite a negative diagnostic result. We believe this method, if used in future longitudinal studies of maternal GBS colonization, would improve our understanding of the pathologies linked to this bacterium and could also inform maternal GBS vaccine trial design.

12.
iScience ; 26(11): 108354, 2023 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-38026214

RESUMO

Classic ANOVA (cA) tests the explanatory power of a partitioning on a set of objects. More fit for clusters proximity analysis, nonparametric ANOVA (npA) extends to a case where instead of the object values themselves, their mutual distances are available. However, extending the cA applicability, the metric conditions in npA are limiting. Based on the central limit theorem (CLT), here we introduce nonmetric ANOVA (nmA) that by relaxing the metric properties between objects, allows an ANOVA-like statistical testing of a network clusters disparity. We present a parametric test statistic which under the null hypothesis of no differences between the competing clusters means, follows an exact F-distribution. We apply our method on three diverse biological examples, discuss its parallel performance, and note the specific use of each method tailored by the inherent data properties. The R code is provided at github.com/AmiryousefiLab/nmANOVA.

13.
Med Biol Eng Comput ; 61(1): 75-95, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36322242

RESUMO

Customization of cardiac action potential models has become increasingly important with the recognition of patient-specific models and virtual patient cohorts as valuable predictive tools. Nevertheless, developing customized models by fitting parameters to data poses technical and methodological challenges: despite noise and variability associated with real-world datasets, traditional optimization methods produce a single "best-fit" set of parameter values. Bayesian estimation methods seek distributions of parameter values given the data by obtaining samples from the target distribution, but in practice widely known Bayesian algorithms like Markov chain Monte Carlo tend to be computationally inefficient and scale poorly with the dimensionality of parameter space. In this paper, we consider two computationally efficient Bayesian approaches: the Hamiltonian Monte Carlo (HMC) algorithm and the approximate Bayesian computation sequential Monte Carlo (ABC-SMC) algorithm. We find that both methods successfully identify distributions of model parameters for two cardiac action potential models using model-derived synthetic data and an experimental dataset from a zebrafish heart. Although both methods appear to converge to the same distribution family and are computationally efficient, HMC generally finds narrower marginal distributions, while ABC-SMC is less sensitive to the algorithmic settings including the prior distribution.


Assuntos
Algoritmos , Peixe-Zebra , Animais , Teorema de Bayes , Método de Monte Carlo , Cadeias de Markov
14.
MethodsX ; 9: 101599, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34917491

RESUMO

The seabird meta-population viability model (mPVA) uses a generalized approach to project abundance and quasi-extinction risk for 102 seabird species under various conservation scenarios. The mPVA is a stage-structured projection matrix that tracks abundance of multiple populations linked by dispersal, accounting for breeding island characteristics and spatial distribution. Data are derived from published studies, grey literature, and expert review (with over 500 contributions). Invasive species impacts were generalized to stage-specific vital rates by fitting a Bayesian state-space model to trend data from Islands where invasive removals had occurred, while accounting for characteristics of seabird biology, breeding islands and invasive species. Survival rates were estimated using a competing hazards formulation to account for impacts of multiple threats, while also allowing for environmental and demographic stochasticity, density dependence and parameter uncertainty.•The mPVA provides resource managers with a tool to quantitatively assess potential benefits of alternative management actions, for multiple species•The mPVA compares projected abundance and quasi-extinction risk under current conditions (no intervention) and various conservation scenarios, including removal of invasive species from specified breeding islands, translocation or reintroduction of individuals to an island of specified location and size, and at-sea mortality amelioration via reduction in annual at-sea deaths.

15.
iScience ; 24(8): 102853, 2021 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-34381977

RESUMO

Bayes' rule is a fundamental principle that has been applied across multiple disciplines. However, few studies have addressed its origin as a cognitive strategy or the underlying basis for generalization from a small sample. Using a simple binary choice model subject to natural selection, we derive Bayesian inference as an adaptive behavior under certain stochastic environments. Such behavior emerges purely through the forces of evolution, despite the fact that our population consists of mindless individuals without any ability to reason, act strategically, or accurately encode or infer environmental states probabilistically. In addition, three specific environments favor the emergence of finite memory-those that are Markov, nonstationary, and environments where sampling contains too little or too much information about local conditions. These results provide an explanation for several known phenomena in human cognition, including deviations from the optimal Bayesian strategy and finite memory beyond resource constraints.

16.
Brain Sci ; 11(1)2021 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-33445771

RESUMO

One significant characteristic of Multiple Sclerosis (MS), a chronic inflammatory demyelinating disease of the central nervous system, is the evolution of highly variable patterns of white matter lesions. Based on geostatistical metrics, the MS-Lesion Pattern Discrimination Plot reduces complex three- and four-dimensional configurations of MS-White Matter Lesions to a well-arranged and standardized two-dimensional plot that facilitates follow-up, cross-sectional and medication impact analysis. Here, we present a script that generates the MS-Lesion Pattern Discrimination Plot, using the widespread statistical computing environment R. Input data to the script are Nifti-1 or Analyze-7.5 files with individual MS-White Matter Lesion masks in Montreal Normal Brain geometry. The MS-Lesion Pattern Discrimination Plot, variogram plots and associated fitting statistics are output to the R console and exported to standard graphics and text files. Besides reviewing relevant geostatistical basics and commenting on implementation details for smooth customization and extension, the paper guides through generating MS-Lesion Pattern Discrimination Plots using publicly available synthetic MS-Lesion patterns. The paper is accompanied by the R script LDPgenerator.r, a small sample data set and associated graphics for comparison.

17.
Epidemiol Health ; 42: e2020028, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32512670

RESUMO

Coronavirus disease 2019 (COVID-19), which causes severe respiratory illness, has become a pandemic. The World Health Organization has declared it a public health crisis of international concern. We developed a susceptible, exposed, infected, recovered (SEIR) model for COVID-19 to show the importance of estimating the reproduction number (R0). This work is focused on predicting the COVID-19 outbreak in its early stage in India based on an estimation of R0. The developed model will help policymakers to take active measures prior to the further spread of COVID-19. Data on daily newly infected cases in India from March 2, 2020 to April 2, 2020 were to estimate R0 using the earlyR package. The maximum-likelihood approach was used to analyze the distribution of R0 values, and the bootstrap strategy was applied for resampling to identify the most likely R0 value. We estimated the median value of R0 to be 1.471 (95% confidence interval [CI], 1.351 to 1.592) and predicted that the new case count may reach 39,382 (95% CI, 34,300 to 47,351) in 30 days.


Assuntos
Número Básico de Reprodução/estatística & dados numéricos , Infecções por Coronavirus/epidemiologia , Surtos de Doenças , Pneumonia Viral/epidemiologia , COVID-19 , Previsões , Humanos , Índia/epidemiologia , Computação Matemática , Pandemias
18.
iScience ; 14: 125-135, 2019 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-30954780

RESUMO

LOVE, a robust, scalable latent model-based clustering method for biological discovery, can be used across a range of datasets to generate both overlapping and non-overlapping clusters. In our formulation, a cluster comprises variables associated with the same latent factor and is determined from an allocation matrix that indexes our latent model. We prove that the allocation matrix and corresponding clusters are uniquely defined. We apply LOVE to biological datasets (gene expression, serological responses measured from HIV controllers and chronic progressors, vaccine-induced humoral immune responses) resulting in meaningful biological output. For all three datasets, the clusters generated by LOVE remain stable across tuning parameters. Finally, we compared LOVE's performance to that of 13 state-of-the-art methods using previously established benchmarks and found that LOVE outperformed these methods across datasets. Our results demonstrate that LOVE can be broadly used across large-scale biological datasets to generate accurate and meaningful overlapping and non-overlapping clusters.

19.
Ann Transl Med ; 7(23): 796, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32042812

RESUMO

This article is the series of methodology of clinical prediction model construction (total 16 sections of this methodology series). The first section mainly introduces the concept, current application status, construction methods and processes, classification of clinical prediction models, and the necessary conditions for conducting such researches and the problems currently faced. The second episode of these series mainly concentrates on the screening method in multivariate regression analysis. The third section mainly introduces the construction method of prediction models based on Logistic regression and Nomogram drawing. The fourth episode mainly concentrates on Cox proportional hazards regression model and Nomogram drawing. The fifth Section of the series mainly introduces the calculation method of C-Statistics in the logistic regression model. The sixth section mainly introduces two common calculation methods for C-Index in Cox regression based on R. The seventh section focuses on the principle and calculation methods of Net Reclassification Index (NRI) using R. The eighth section focuses on the principle and calculation methods of IDI (Integrated Discrimination Index) using R. The ninth section continues to explore the evaluation method of clinical utility after predictive model construction: Decision Curve Analysis. The tenth section is a supplement to the previous section and mainly introduces the Decision Curve Analysis of survival outcome data. The eleventh section mainly discusses the external validation method of Logistic regression model. The twelfth mainly discusses the in-depth evaluation of Cox regression model based on R, including calculating the concordance index of discrimination (C-index) in the validation data set and drawing the calibration curve. The thirteenth section mainly introduces how to deal with the survival data outcome using competitive risk model with R. The fourteenth section mainly introduces how to draw the nomogram of the competitive risk model with R. The fifteenth section of the series mainly discusses the identification of outliers and the interpolation of missing values. The sixteenth section of the series mainly introduced the advanced variable selection methods in linear model, such as Ridge regression and LASSO regression.

20.
PeerJ Comput Sci ; 5: e175, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-33816828

RESUMO

Today's computational researchers are expected to be highly proficient in using software to solve a wide range of problems ranging from processing large datasets to developing personalized treatment strategies from a growing range of options. Researchers are well versed in their own field, but may lack formal training and appropriate mentorship in software engineering principles. Two major themes not covered in most university coursework nor current literature are software testing and software optimization. Through a survey of all currently available Comprehensive R Archive Network packages, we show that reproducible and replicable software tests are frequently not available and that many packages do not appear to employ software performance and optimization tools and techniques. Through use of examples from an existing R package, we demonstrate powerful testing and optimization techniques that can improve the quality of any researcher's software.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa