Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
J Phys Chem B ; 127(11): 2362-2374, 2023 Mar 23.
Artículo en Inglés | MEDLINE | ID: mdl-36893480

RESUMEN

Ordinary differential equation (ODE) models are widely used to describe chemical or biological processes. This Article considers the estimation and assessment of such models on the basis of time-course data. Due to experimental limitations, time-course data are often noisy, and some components of the system may not be observed. Furthermore, the computational demands of numerical integration have hindered the widespread adoption of time-course analysis using ODEs. To address these challenges, we explore the efficacy of the recently developed MAGI (MAnifold-constrained Gaussian process Inference) method for ODE inference. First, via a range of examples we show that MAGI is capable of inferring the parameters and system trajectories, including unobserved components, with appropriate uncertainty quantification. Second, we illustrate how MAGI can be used to assess and select different ODE models with time-course data based on MAGI's efficient computation of model predictions. Overall, we believe MAGI is a useful method for the analysis of time-course data in the context of ODE models, which bypasses the need for any numerical integration.

2.
NPJ Precis Oncol ; 5(1): 82, 2021 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-34508179

RESUMEN

Immune checkpoint inhibitors have demonstrated significant survival benefits in treating many types of cancers. However, their immune-related adverse events (irAEs) have not been systematically evaluated across cancer types in large-scale real-world populations. To address this gap, we conducted real-world data analyses using nationwide insurance claims data with 85.97 million enrollees across 8 years. We identified a significantly increased risk of developing irAEs among patients receiving immunotherapy agents in all seven cancer types commonly treated with immune checkpoint inhibitors. By six months after treatment initialization, those receiving immunotherapy were 1.50-4.00 times (95% CI, lower bound from 1.15 to 2.16, upper bound from 1.69 to 20.36) more likely to develop irAEs in the first 6 months of treatment, compared to matched chemotherapy or targeted therapy groups, with a total of 92,858 patients. The risk of developing irAEs among patients using nivolumab is higher compared to those using pembrolizumab. These results confirmed the need for clinicians to assess irAEs among cancer patients undergoing immunotherapy as part of management. Our methods are extensible to characterizing the effectiveness and adverse effects of novel treatments in large populations in an efficient and economical fashion.

3.
Proc Natl Acad Sci U S A ; 118(15)2021 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-33837150

RESUMEN

Parameter estimation for nonlinear dynamic system models, represented by ordinary differential equations (ODEs), using noisy and sparse data, is a vital task in many fields. We propose a fast and accurate method, manifold-constrained Gaussian process inference (MAGI), for this task. MAGI uses a Gaussian process model over time series data, explicitly conditioned on the manifold constraint that derivatives of the Gaussian process must satisfy the ODE system. By doing so, we completely bypass the need for numerical integration and achieve substantial savings in computational time. MAGI is also suitable for inference with unobserved system components, which often occur in real experiments. MAGI is distinct from existing approaches as we provide a principled statistical construction under a Bayesian framework, which incorporates the ODE system through the manifold constraint. We demonstrate the accuracy and speed of MAGI using realistic examples based on physical experiments.

4.
Sci Rep ; 11(1): 4023, 2021 02 17.
Artículo en Inglés | MEDLINE | ID: mdl-33597556

RESUMEN

For epidemics control and prevention, timely insights of potential hot spots are invaluable. Alternative to traditional epidemic surveillance, which often lags behind real time by weeks, big data from the Internet provide important information of the current epidemic trends. Here we present a methodology, ARGOX (Augmented Regression with GOogle data CROSS space), for accurate real-time tracking of state-level influenza epidemics in the United States. ARGOX combines Internet search data at the national, regional and state levels with traditional influenza surveillance data from the Centers for Disease Control and Prevention, and accounts for both the spatial correlation structure of state-level influenza activities and the evolution of people's Internet search pattern. ARGOX achieves on average 28% error reduction over the best alternative for real-time state-level influenza estimation for 2014 to 2020. ARGOX is robust and reliable and can be potentially applied to track county- and city-level influenza activity and other infectious diseases.


Asunto(s)
Epidemias/prevención & control , Gripe Humana/epidemiología , Uso de Internet/tendencias , Macrodatos , Centers for Disease Control and Prevention, U.S. , Epidemias/estadística & datos numéricos , Monitoreo Epidemiológico , Humanos , Internet/tendencias , Vigilancia de la Población/métodos , Motor de Búsqueda/tendencias , Estados Unidos/epidemiología
5.
J Med Internet Res ; 22(8): e16709, 2020 08 05.
Artículo en Inglés | MEDLINE | ID: mdl-32755895

RESUMEN

BACKGROUND: Chest computed tomography (CT) is crucial for the detection of lung cancer, and many automated CT evaluation methods have been proposed. Due to the divergent software dependencies of the reported approaches, the developed methods are rarely compared or reproduced. OBJECTIVE: The goal of the research was to generate reproducible machine learning modules for lung cancer detection and compare the approaches and performances of the award-winning algorithms developed in the Kaggle Data Science Bowl. METHODS: We obtained the source codes of all award-winning solutions of the Kaggle Data Science Bowl Challenge, where participants developed automated CT evaluation methods to detect lung cancer (training set n=1397, public test set n=198, final test set n=506). The performance of the algorithms was evaluated by the log-loss function, and the Spearman correlation coefficient of the performance in the public and final test sets was computed. RESULTS: Most solutions implemented distinct image preprocessing, segmentation, and classification modules. Variants of U-Net, VGGNet, and residual net were commonly used in nodule segmentation, and transfer learning was used in most of the classification algorithms. Substantial performance variations in the public and final test sets were observed (Spearman correlation coefficient = .39 among the top 10 teams). To ensure the reproducibility of results, we generated a Docker container for each of the top solutions. CONCLUSIONS: We compared the award-winning algorithms for lung cancer detection and generated reproducible Docker images for the top solutions. Although convolutional neural networks achieved decent accuracy, there is plenty of room for improvement regarding model generalizability.


Asunto(s)
Neoplasias Pulmonares/diagnóstico por imagen , Neoplasias Pulmonares/diagnóstico , Aprendizaje Automático/normas , Tomografía Computarizada por Rayos X/métodos , Algoritmos , Humanos , Reproducibilidad de los Resultados
6.
Proc Natl Acad Sci U S A ; 117(22): 12004-12010, 2020 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-32414914

RESUMEN

A catalytic prior distribution is designed to stabilize a high-dimensional "working model" by shrinking it toward a "simplified model." The shrinkage is achieved by supplementing the observed data with a small amount of "synthetic data" generated from a predictive distribution under the simpler model. We apply this framework to generalized linear models, where we propose various strategies for the specification of a tuning parameter governing the degree of shrinkage and study resultant theoretical properties. In simulations, the resulting posterior estimation using such a catalytic prior outperforms maximum likelihood estimation from the working model and is generally comparable with or superior to existing competitive methods in terms of frequentist prediction accuracy of point estimation and coverage accuracy of interval estimation. The catalytic priors have simple interpretations and are easy to formulate.


Asunto(s)
Simulación por Computador/estadística & datos numéricos , Modelos Lineales , Teorema de Bayes , Simulación por Computador/tendencias , Análisis de Datos , Recolección de Datos , Tamaño de la Muestra , Estadística como Asunto
7.
Clin Pharmacol Ther ; 107(2): 388-396, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-31356677

RESUMEN

The autoimmune adverse effects of lung cancer immunotherapy are not fully understood at the population level. Using observational data from commercial health insurance claims, we compared autoimmune diseases risk of immune checkpoint inhibitors (including pembrolizumab and nivolumab) and that of chemotherapy using the matching method. By 6 months after treatment initialization, the cumulative incidence of new autoimmune diseases among patients receiving immunotherapy was 13.13% (95% confidence interval (CI), 10.79-15.50%) and that of the matched chemotherapy patients was 6.65% (95% CI, 5.79-7.50%), constituting a hazard ratio (HR) of 1.97 (95% CI, 1.58-2.48). Both pembrolizumab (HR = 2.06 (95% CI, 1.20-3.65), P = 0.0032) and nivolumab (HR = 1.76 (95% CI, 1.39-2.24), P < 0.0001) were associated with higher risks of developing autoimmune diseases, especially for hypothyroidism (P < 0.0001). Our findings suggest the need to monitor autoimmune side effects of immunotherapy.


Asunto(s)
Anticuerpos Monoclonales Humanizados/efectos adversos , Antineoplásicos Inmunológicos/efectos adversos , Enfermedades Autoinmunes/inducido químicamente , Neoplasias Pulmonares/tratamiento farmacológico , Nivolumab/efectos adversos , Adulto , Anciano , Anticuerpos Monoclonales Humanizados/uso terapéutico , Antineoplásicos Inmunológicos/uso terapéutico , Femenino , Humanos , Masculino , Persona de Mediana Edad , Nivolumab/uso terapéutico
8.
Sci Rep ; 9(1): 5238, 2019 03 27.
Artículo en Inglés | MEDLINE | ID: mdl-30918276

RESUMEN

Accurate, high-resolution tracking of influenza epidemics at the regional level helps public health agencies make informed and proactive decisions, especially in the face of outbreaks. Internet users' online searches offer great potential for the regional tracking of influenza. However, due to the complex data structure and reduced quality of Internet data at the regional level, few established methods provide satisfactory performance. In this article, we propose a novel method named ARGO2 (2-step Augmented Regression with GOogle data) that efficiently combines publicly available Google search data at different resolutions (national and regional) with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) for accurate, real-time regional tracking of influenza. ARGO2 gives very competitive performance across all US regions compared with available Internet-data-based regional influenza tracking methods, and it has achieved 30% error reduction over the best alternative method that we numerically tested for the period of March 2009 to March 2018. ARGO2 is reliable and robust, with the flexibility to incorporate additional information from other sources and resolutions, making it a powerful tool for regional influenza tracking, and potentially for tracking other social, economic, or public health events at the regional or local level.


Asunto(s)
Minería de Datos , Monitoreo Epidemiológico , Gripe Humana/epidemiología , Internet , Humanos
10.
BMC Infect Dis ; 17(1): 332, 2017 05 08.
Artículo en Inglés | MEDLINE | ID: mdl-28482810

RESUMEN

BACKGROUND: Accurate influenza activity forecasting helps public health officials prepare and allocate resources for unusual influenza activity. Traditional flu surveillance systems, such as the Centers for Disease Control and Prevention's (CDC) influenza-like illnesses reports, lag behind real-time by one to 2 weeks, whereas information contained in cloud-based electronic health records (EHR) and in Internet users' search activity is typically available in near real-time. We present a method that combines the information from these two data sources with historical flu activity to produce national flu forecasts for the United States up to 4 weeks ahead of the publication of CDC's flu reports. METHODS: We extend a method originally designed to track flu using Google searches, named ARGO, to combine information from EHR and Internet searches with historical flu activities. Our regularized multivariate regression model dynamically selects the most appropriate variables for flu prediction every week. The model is assessed for the flu seasons within the time period 2013-2016 using multiple metrics including root mean squared error (RMSE). RESULTS: Our method reduces the RMSE of the publicly available alternative (Healthmap flutrends) method by 33, 20, 17 and 21%, for the four time horizons: real-time, one, two, and 3 weeks ahead, respectively. Such accuracy improvements are statistically significant at the 5% level. Our real-time estimates correctly identified the peak timing and magnitude of the studied flu seasons. CONCLUSIONS: Our method significantly reduces the prediction error when compared to historical publicly available Internet-based prediction systems, demonstrating that: (1) the method to combine data sources is as important as data quality; (2) effectively extracting information from a cloud-based EHR and Internet search activity leads to accurate forecast of flu.


Asunto(s)
Centers for Disease Control and Prevention, U.S. , Registros Electrónicos de Salud , Gripe Humana/epidemiología , Predicción , Humanos , Internet , Vigilancia de la Población/métodos , Estaciones del Año , Estados Unidos
11.
Proteins ; 85(8): 1402-1412, 2017 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-28378911

RESUMEN

In the prediction of protein structure from amino acid sequence, loops are challenging regions for computational methods. Since loops are often located on the protein surface, they can have significant roles in determining protein functions and binding properties. Loop prediction without the aid of a structural template requires extensive conformational sampling and energy minimization, which are computationally difficult. In this article we present a new de novo loop sampling method, the Parallely filtered Energy Targeted All-atom Loop Sampler (PETALS) to rapidly locate low energy conformations. PETALS explores both backbone and side-chain positions of the loop region simultaneously according to the energy function selected by the user, and constructs a nonredundant ensemble of low energy loop conformations using filtering criteria. The method is illustrated with the DFIRE potential and DiSGro energy function for loops, and shown to be highly effective at discovering conformations with near-native (or better) energy. Using the same energy function as the DiSGro algorithm, PETALS samples conformations with both lower RMSDs and lower energies. PETALS is also useful for assessing the accuracy of different energy functions. PETALS runs rapidly, requiring an average time cost of 10 minutes for a length 12 loop on a single 3.2 GHz processor core, comparable to the fastest existing de novo methods for generating an ensemble of conformations. Proteins 2017; 85:1402-1412. © 2017 Wiley Periodicals, Inc.


Asunto(s)
Algoritmos , Aminoácidos/química , Biología Computacional/métodos , Proteínas/química , Secuencia de Aminoácidos , Simulación por Computador , Modelos Moleculares , Conformación Proteica en Hélice alfa , Dominios y Motivos de Interacción de Proteínas , Termodinámica
12.
J Am Stat Assoc ; 111(513): 314-330, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27212739

RESUMEN

This paper studies the estimation of stepwise signal. To determine the number and locations of change-points of the stepwise signal, we formulate a maximum marginal likelihood estimator, which can be computed with a quadratic cost using dynamic programming. We carry out extensive investigation on the choice of the prior distribution and study the asymptotic properties of the maximum marginal likelihood estimator. We propose to treat each possible set of change-points equally and adopt an empirical Bayes approach to specify the prior distribution of segment parameters. Detailed simulation study is performed to compare the effectiveness of this method with other existing methods. We demonstrate our method on single-molecule enzyme reaction data and on DNA array CGH data. Our study shows that this method is applicable to a wide range of models and offers appealing results in practice.

13.
Ann Stat ; 44(2): 564-597, 2016 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-27041778

RESUMEN

This paper discusses the simultaneous inference of mean parameters in a family of distributions with quadratic variance function. We first introduce a class of semi-parametric/parametric shrinkage estimators and establish their asymptotic optimality properties. Two specific cases, the location-scale family and the natural exponential family with quadratic variance function, are then studied in detail. We conduct a comprehensive simulation study to compare the performance of the proposed methods with existing shrinkage estimators. We also apply the method to real data and obtain encouraging results.

14.
J Am Stat Assoc ; 111(515): 951-966, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-28943680

RESUMEN

To maintain proper cellular functions, over 50% of proteins encoded in the genome need to be transported to cellular membranes. The molecular mechanism behind such a process, often referred to as protein targeting, is not well understood. Single-molecule experiments are designed to unveil the detailed mechanisms and reveal the functions of different molecular machineries involved in the process. The experimental data consist of hundreds of stochastic time traces from the fluorescence recordings of the experimental system. We introduce a Bayesian hierarchical model on top of hidden Markov models (HMMs) to analyze these data and use the statistical results to answer the biological questions. In addition to resolving the biological puzzles and delineating the regulating roles of different molecular complexes, our statistical results enable us to propose a more detailed mechanism for the late stages of the protein targeting process.

15.
Proc Natl Acad Sci U S A ; 112(47): 14473-8, 2015 Nov 24.
Artículo en Inglés | MEDLINE | ID: mdl-26553980

RESUMEN

Accurate real-time tracking of influenza outbreaks helps public health officials make timely and meaningful decisions that could save lives. We propose an influenza tracking model, ARGO (AutoRegression with GOogle search data), that uses publicly available online search data. In addition to having a rigorous statistical foundation, ARGO outperforms all previously available Google-search-based tracking models, including the latest version of Google Flu Trends, even though it uses only low-quality search data as input from publicly available Google Trends and Google Correlate websites. ARGO not only incorporates the seasonality in influenza epidemics but also captures changes in people's online search behavior over time. ARGO is also flexible, self-correcting, robust, and scalable, making it a potentially powerful tool that can be used for real-time tracking of other social events at multiple temporal and spatial resolutions.


Asunto(s)
Epidemias , Gripe Humana/epidemiología , Humanos , Internet , Estudios Retrospectivos , Motor de Búsqueda
16.
Annu Rev Stat Appl ; 1: 465-492, 2014 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-25009825

RESUMEN

Since the universal acceptance of atoms and molecules as the fundamental constituents of matter in the early twentieth century, molecular physics, chemistry and molecular biology have all experienced major theoretical breakthroughs. To be able to actually "see" biological macromolecules, one at a time in action, one has to wait until the 1970s. Since then the field of single-molecule biophysics has witnessed extensive growth both in experiments and theory. A distinct feature of single-molecule biophysics is that the motions and interactions of molecules and the transformation of molecular species are necessarily described in the language of stochastic processes, whether one investigates equilibrium or nonequilibrium living behavior. For laboratory measurements following a biological process, if it is sampled over time on individual participating molecules, then the analysis of experimental data naturally calls for the inference of stochastic processes. The theoretical and experimental developments of single-molecule biophysics thus present interesting questions and unique opportunity for applied statisticians and probabilists. In this article, we review some important statistical developments in connection to single-molecule biophysics, emphasizing the application of stochastic-process theory and the statistical questions arising from modeling and analyzing experimental data.

17.
Ann Appl Stat ; 6(3): 950-976, 2012 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-23408514

RESUMEN

New advances in nano sciences open the door for scientists to study biological processes on a microscopic molecule-by-molecule basis. Recent single-molecule biophysical experiments on enzyme systems, in particular, reveal that enzyme molecules behave fundamentally differently from what classical model predicts. A stochastic network model was previously proposed to explain the experimental discovery. This paper conducts detailed theoretical and data analyses of the stochastic network model, focusing on the correlation structure of the successive reaction times of a single enzyme molecule. We investigate the correlation of experimental fluorescence intensity and the correlation of enzymatic reaction times, and examine the role of substrate concentration in enzymatic reactions. Our study shows that the stochastic network model is capable of explaining the experimental data in depth.

18.
J Am Stat Assoc ; 107(500): 1465-1479, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25301976

RESUMEN

Hierarchical models are extensively studied and widely used in statistics and many other scientific areas. They provide an effective tool for combining information from similar resources and achieving partial pooling of inference. Since the seminal work by James and Stein (1961) and Stein (1962), shrinkage estimation has become one major focus for hierarchical models. For the homoscedastic normal model, it is well known that shrinkage estimators, especially the James-Stein estimator, have good risk properties. The heteroscedastic model, though more appropriate for practical applications, is less well studied, and it is unclear what types of shrinkage estimators are superior in terms of the risk. We propose in this paper a class of shrinkage estimators based on Stein's unbiased estimate of risk (SURE). We study asymptotic properties of various common estimators as the number of means to be estimated grows (p → ∞). We establish the asymptotic optimality property for the SURE estimators. We then extend our construction to create a class of semi-parametric shrinkage estimators and establish corresponding asymptotic optimality results. We emphasize that though the form of our SURE estimators is partially obtained through a normal model at the sampling level, their optimality properties do not heavily depend on such distributional assumptions. We apply the methods to two real data sets and obtain encouraging results.

19.
J Am Stat Assoc ; 107(500): 1558-1574, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25328259

RESUMEN

Diffusion process models are widely used in science, engineering and finance. Most diffusion processes are described by stochastic differential equations in continuous time. In practice, however, data is typically only observed at discrete time points. Except for a few very special cases, no analytic form exists for the likelihood of such discretely observed data. For this reason, parametric inference is often achieved by using discrete-time approximations, with accuracy controlled through the introduction of missing data. We present a new multiresolution Bayesian framework to address the inference difficulty. The methodology relies on the use of multiple approximations and extrapolation, and is significantly faster and more accurate than known strategies based on Gibbs sampling. We apply the multiresolution approach to three data-driven inference problems - one in biophysics and two in finance - one of which features a multivariate diffusion model with an entirely unobserved component.

20.
Stat Sin ; 21(4): 1687-1711, 2011 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-21969801

RESUMEN

We provide a complete proof of the convergence of a recently developed sampling algorithm called the equi-energy (EE) sampler (Kou, Zhou, and Wong, 2006) in the case that the state space is countable. We show that in a countable state space, each sampling chain in the EE sampler is strongly ergodic a.s. with the desired steady-state distribution. Furthermore, all chains satisfy the individual ergodic property. We apply the EE sampler to the Ising model to test its efficiency, comparing it with the Metropolis algorithm and the parallel tempering algorithm. We observe that the dynamic exponent of the EE sampler is significantly smaller than those of parallel tempering and the Metropolis algorithm, demonstrating the high efficiency of the EE sampler.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...