Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 35
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Proc Natl Acad Sci U S A ; 118(15)2021 04 13.
Article in English | MEDLINE | ID: mdl-33837150

ABSTRACT

Parameter estimation for nonlinear dynamic system models, represented by ordinary differential equations (ODEs), using noisy and sparse data, is a vital task in many fields. We propose a fast and accurate method, manifold-constrained Gaussian process inference (MAGI), for this task. MAGI uses a Gaussian process model over time series data, explicitly conditioned on the manifold constraint that derivatives of the Gaussian process must satisfy the ODE system. By doing so, we completely bypass the need for numerical integration and achieve substantial savings in computational time. MAGI is also suitable for inference with unobserved system components, which often occur in real experiments. MAGI is distinct from existing approaches as we provide a principled statistical construction under a Bayesian framework, which incorporates the ODE system through the manifold constraint. We demonstrate the accuracy and speed of MAGI using realistic examples based on physical experiments.

2.
Proc Natl Acad Sci U S A ; 117(22): 12004-12010, 2020 06 02.
Article in English | MEDLINE | ID: mdl-32414914

ABSTRACT

A catalytic prior distribution is designed to stabilize a high-dimensional "working model" by shrinking it toward a "simplified model." The shrinkage is achieved by supplementing the observed data with a small amount of "synthetic data" generated from a predictive distribution under the simpler model. We apply this framework to generalized linear models, where we propose various strategies for the specification of a tuning parameter governing the degree of shrinkage and study resultant theoretical properties. In simulations, the resulting posterior estimation using such a catalytic prior outperforms maximum likelihood estimation from the working model and is generally comparable with or superior to existing competitive methods in terms of frequentist prediction accuracy of point estimation and coverage accuracy of interval estimation. The catalytic priors have simple interpretations and are easy to formulate.


Subject(s)
Computer Simulation/statistics & numerical data , Linear Models , Bayes Theorem , Computer Simulation/trends , Data Analysis , Data Collection , Sample Size , Statistics as Topic
3.
J Med Internet Res ; 22(8): e16709, 2020 08 05.
Article in English | MEDLINE | ID: mdl-32755895

ABSTRACT

BACKGROUND: Chest computed tomography (CT) is crucial for the detection of lung cancer, and many automated CT evaluation methods have been proposed. Due to the divergent software dependencies of the reported approaches, the developed methods are rarely compared or reproduced. OBJECTIVE: The goal of the research was to generate reproducible machine learning modules for lung cancer detection and compare the approaches and performances of the award-winning algorithms developed in the Kaggle Data Science Bowl. METHODS: We obtained the source codes of all award-winning solutions of the Kaggle Data Science Bowl Challenge, where participants developed automated CT evaluation methods to detect lung cancer (training set n=1397, public test set n=198, final test set n=506). The performance of the algorithms was evaluated by the log-loss function, and the Spearman correlation coefficient of the performance in the public and final test sets was computed. RESULTS: Most solutions implemented distinct image preprocessing, segmentation, and classification modules. Variants of U-Net, VGGNet, and residual net were commonly used in nodule segmentation, and transfer learning was used in most of the classification algorithms. Substantial performance variations in the public and final test sets were observed (Spearman correlation coefficient = .39 among the top 10 teams). To ensure the reproducibility of results, we generated a Docker container for each of the top solutions. CONCLUSIONS: We compared the award-winning algorithms for lung cancer detection and generated reproducible Docker images for the top solutions. Although convolutional neural networks achieved decent accuracy, there is plenty of room for improvement regarding model generalizability.


Subject(s)
Lung Neoplasms/diagnostic imaging , Lung Neoplasms/diagnosis , Machine Learning/standards , Tomography, X-Ray Computed/methods , Algorithms , Humans , Reproducibility of Results
4.
Proc Natl Acad Sci U S A ; 112(47): 14473-8, 2015 Nov 24.
Article in English | MEDLINE | ID: mdl-26553980

ABSTRACT

Accurate real-time tracking of influenza outbreaks helps public health officials make timely and meaningful decisions that could save lives. We propose an influenza tracking model, ARGO (AutoRegression with GOogle search data), that uses publicly available online search data. In addition to having a rigorous statistical foundation, ARGO outperforms all previously available Google-search-based tracking models, including the latest version of Google Flu Trends, even though it uses only low-quality search data as input from publicly available Google Trends and Google Correlate websites. ARGO not only incorporates the seasonality in influenza epidemics but also captures changes in people's online search behavior over time. ARGO is also flexible, self-correcting, robust, and scalable, making it a potentially powerful tool that can be used for real-time tracking of other social events at multiple temporal and spatial resolutions.


Subject(s)
Epidemics , Influenza, Human/epidemiology , Humans , Internet , Retrospective Studies , Search Engine
5.
Proteins ; 85(8): 1402-1412, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28378911

ABSTRACT

In the prediction of protein structure from amino acid sequence, loops are challenging regions for computational methods. Since loops are often located on the protein surface, they can have significant roles in determining protein functions and binding properties. Loop prediction without the aid of a structural template requires extensive conformational sampling and energy minimization, which are computationally difficult. In this article we present a new de novo loop sampling method, the Parallely filtered Energy Targeted All-atom Loop Sampler (PETALS) to rapidly locate low energy conformations. PETALS explores both backbone and side-chain positions of the loop region simultaneously according to the energy function selected by the user, and constructs a nonredundant ensemble of low energy loop conformations using filtering criteria. The method is illustrated with the DFIRE potential and DiSGro energy function for loops, and shown to be highly effective at discovering conformations with near-native (or better) energy. Using the same energy function as the DiSGro algorithm, PETALS samples conformations with both lower RMSDs and lower energies. PETALS is also useful for assessing the accuracy of different energy functions. PETALS runs rapidly, requiring an average time cost of 10 minutes for a length 12 loop on a single 3.2 GHz processor core, comparable to the fastest existing de novo methods for generating an ensemble of conformations. Proteins 2017; 85:1402-1412. © 2017 Wiley Periodicals, Inc.


Subject(s)
Algorithms , Amino Acids/chemistry , Computational Biology/methods , Proteins/chemistry , Amino Acid Sequence , Computer Simulation , Models, Molecular , Protein Conformation, alpha-Helical , Protein Interaction Domains and Motifs , Thermodynamics
6.
BMC Infect Dis ; 17(1): 332, 2017 05 08.
Article in English | MEDLINE | ID: mdl-28482810

ABSTRACT

BACKGROUND: Accurate influenza activity forecasting helps public health officials prepare and allocate resources for unusual influenza activity. Traditional flu surveillance systems, such as the Centers for Disease Control and Prevention's (CDC) influenza-like illnesses reports, lag behind real-time by one to 2 weeks, whereas information contained in cloud-based electronic health records (EHR) and in Internet users' search activity is typically available in near real-time. We present a method that combines the information from these two data sources with historical flu activity to produce national flu forecasts for the United States up to 4 weeks ahead of the publication of CDC's flu reports. METHODS: We extend a method originally designed to track flu using Google searches, named ARGO, to combine information from EHR and Internet searches with historical flu activities. Our regularized multivariate regression model dynamically selects the most appropriate variables for flu prediction every week. The model is assessed for the flu seasons within the time period 2013-2016 using multiple metrics including root mean squared error (RMSE). RESULTS: Our method reduces the RMSE of the publicly available alternative (Healthmap flutrends) method by 33, 20, 17 and 21%, for the four time horizons: real-time, one, two, and 3 weeks ahead, respectively. Such accuracy improvements are statistically significant at the 5% level. Our real-time estimates correctly identified the peak timing and magnitude of the studied flu seasons. CONCLUSIONS: Our method significantly reduces the prediction error when compared to historical publicly available Internet-based prediction systems, demonstrating that: (1) the method to combine data sources is as important as data quality; (2) effectively extracting information from a cloud-based EHR and Internet search activity leads to accurate forecast of flu.


Subject(s)
Centers for Disease Control and Prevention, U.S. , Electronic Health Records , Influenza, Human/epidemiology , Forecasting , Humans , Internet , Population Surveillance/methods , Seasons , United States
7.
Ann Stat ; 44(2): 564-597, 2016 Mar 01.
Article in English | MEDLINE | ID: mdl-27041778

ABSTRACT

This paper discusses the simultaneous inference of mean parameters in a family of distributions with quadratic variance function. We first introduce a class of semi-parametric/parametric shrinkage estimators and establish their asymptotic optimality properties. Two specific cases, the location-scale family and the natural exponential family with quadratic variance function, are then studied in detail. We conduct a comprehensive simulation study to compare the performance of the proposed methods with existing shrinkage estimators. We also apply the method to real data and obtain encouraging results.

8.
J Phys Chem B ; 127(11): 2362-2374, 2023 Mar 23.
Article in English | MEDLINE | ID: mdl-36893480

ABSTRACT

Ordinary differential equation (ODE) models are widely used to describe chemical or biological processes. This Article considers the estimation and assessment of such models on the basis of time-course data. Due to experimental limitations, time-course data are often noisy, and some components of the system may not be observed. Furthermore, the computational demands of numerical integration have hindered the widespread adoption of time-course analysis using ODEs. To address these challenges, we explore the efficacy of the recently developed MAGI (MAnifold-constrained Gaussian process Inference) method for ODE inference. First, via a range of examples we show that MAGI is capable of inferring the parameters and system trajectories, including unobserved components, with appropriate uncertainty quantification. Second, we illustrate how MAGI can be used to assess and select different ODE models with time-course data based on MAGI's efficient computation of model predictions. Overall, we believe MAGI is a useful method for the analysis of time-course data in the context of ODE models, which bypasses the need for any numerical integration.

9.
Stat Sin ; 21(4): 1687-1711, 2011 Oct 01.
Article in English | MEDLINE | ID: mdl-21969801

ABSTRACT

We provide a complete proof of the convergence of a recently developed sampling algorithm called the equi-energy (EE) sampler (Kou, Zhou, and Wong, 2006) in the case that the state space is countable. We show that in a countable state space, each sampling chain in the EE sampler is strongly ergodic a.s. with the desired steady-state distribution. Furthermore, all chains satisfy the individual ergodic property. We apply the EE sampler to the Ising model to test its efficiency, comparing it with the Metropolis algorithm and the parallel tempering algorithm. We observe that the dynamic exponent of the EE sampler is significantly smaller than those of parallel tempering and the Metropolis algorithm, demonstrating the high efficiency of the EE sampler.

10.
Sci Rep ; 11(1): 4023, 2021 02 17.
Article in English | MEDLINE | ID: mdl-33597556

ABSTRACT

For epidemics control and prevention, timely insights of potential hot spots are invaluable. Alternative to traditional epidemic surveillance, which often lags behind real time by weeks, big data from the Internet provide important information of the current epidemic trends. Here we present a methodology, ARGOX (Augmented Regression with GOogle data CROSS space), for accurate real-time tracking of state-level influenza epidemics in the United States. ARGOX combines Internet search data at the national, regional and state levels with traditional influenza surveillance data from the Centers for Disease Control and Prevention, and accounts for both the spatial correlation structure of state-level influenza activities and the evolution of people's Internet search pattern. ARGOX achieves on average 28% error reduction over the best alternative for real-time state-level influenza estimation for 2014 to 2020. ARGOX is robust and reliable and can be potentially applied to track county- and city-level influenza activity and other infectious diseases.


Subject(s)
Epidemics/prevention & control , Influenza, Human/epidemiology , Internet Use/trends , Big Data , Centers for Disease Control and Prevention, U.S. , Epidemics/statistics & numerical data , Epidemiological Monitoring , Humans , Internet/trends , Population Surveillance/methods , Search Engine/trends , United States/epidemiology
11.
NPJ Precis Oncol ; 5(1): 82, 2021 Sep 10.
Article in English | MEDLINE | ID: mdl-34508179

ABSTRACT

Immune checkpoint inhibitors have demonstrated significant survival benefits in treating many types of cancers. However, their immune-related adverse events (irAEs) have not been systematically evaluated across cancer types in large-scale real-world populations. To address this gap, we conducted real-world data analyses using nationwide insurance claims data with 85.97 million enrollees across 8 years. We identified a significantly increased risk of developing irAEs among patients receiving immunotherapy agents in all seven cancer types commonly treated with immune checkpoint inhibitors. By six months after treatment initialization, those receiving immunotherapy were 1.50-4.00 times (95% CI, lower bound from 1.15 to 2.16, upper bound from 1.69 to 20.36) more likely to develop irAEs in the first 6 months of treatment, compared to matched chemotherapy or targeted therapy groups, with a total of 92,858 patients. The risk of developing irAEs among patients using nivolumab is higher compared to those using pembrolizumab. These results confirmed the need for clinicians to assess irAEs among cancer patients undergoing immunotherapy as part of management. Our methods are extensible to characterizing the effectiveness and adverse effects of novel treatments in large populations in an efficient and economical fashion.

12.
Clin Pharmacol Ther ; 107(2): 388-396, 2020 02.
Article in English | MEDLINE | ID: mdl-31356677

ABSTRACT

The autoimmune adverse effects of lung cancer immunotherapy are not fully understood at the population level. Using observational data from commercial health insurance claims, we compared autoimmune diseases risk of immune checkpoint inhibitors (including pembrolizumab and nivolumab) and that of chemotherapy using the matching method. By 6 months after treatment initialization, the cumulative incidence of new autoimmune diseases among patients receiving immunotherapy was 13.13% (95% confidence interval (CI), 10.79-15.50%) and that of the matched chemotherapy patients was 6.65% (95% CI, 5.79-7.50%), constituting a hazard ratio (HR) of 1.97 (95% CI, 1.58-2.48). Both pembrolizumab (HR = 2.06 (95% CI, 1.20-3.65), P = 0.0032) and nivolumab (HR = 1.76 (95% CI, 1.39-2.24), P < 0.0001) were associated with higher risks of developing autoimmune diseases, especially for hypothyroidism (P < 0.0001). Our findings suggest the need to monitor autoimmune side effects of immunotherapy.


Subject(s)
Antibodies, Monoclonal, Humanized/adverse effects , Antineoplastic Agents, Immunological/adverse effects , Autoimmune Diseases/chemically induced , Lung Neoplasms/drug therapy , Nivolumab/adverse effects , Adult , Aged , Antibodies, Monoclonal, Humanized/therapeutic use , Antineoplastic Agents, Immunological/therapeutic use , Female , Humans , Male , Middle Aged , Nivolumab/therapeutic use
13.
Waste Manag ; 29(2): 621-8, 2009 Feb.
Article in English | MEDLINE | ID: mdl-18691863

ABSTRACT

This paper aims to investigate the fresh and hardened properties of lightweight aggregate concretes that are prepared with the use of recycled plastic waste sourced from scraped PVC pipes to replace river sand as fine aggregates. A number of laboratory prepared concrete mixes were tested, in which river sand was partially replaced by PVC plastic waste granules in percentages of 0%, 5%, 15%, 30% and 45% by volume. Two major findings are identified. The positive side shows that the concrete prepared with a partial replacement by PVC was lighter (lower density), was more ductile (greater Poisson's ratios and reduced modulus of elasticity), and had lower drying shrinkage and higher resistance to chloride ion penetration. The negative side reveals that the workability, compressive strength and tensile splitting strength of the concretes were reduced. The results gathered would form a part of useful information for recycling PVC plastic waste in lightweight concrete mixes.


Subject(s)
Conservation of Natural Resources/methods , Construction Materials/analysis , Polyvinyl Chloride/chemistry , Materials Testing , Particle Size
14.
Sci Rep ; 9(1): 5238, 2019 03 27.
Article in English | MEDLINE | ID: mdl-30918276

ABSTRACT

Accurate, high-resolution tracking of influenza epidemics at the regional level helps public health agencies make informed and proactive decisions, especially in the face of outbreaks. Internet users' online searches offer great potential for the regional tracking of influenza. However, due to the complex data structure and reduced quality of Internet data at the regional level, few established methods provide satisfactory performance. In this article, we propose a novel method named ARGO2 (2-step Augmented Regression with GOogle data) that efficiently combines publicly available Google search data at different resolutions (national and regional) with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) for accurate, real-time regional tracking of influenza. ARGO2 gives very competitive performance across all US regions compared with available Internet-data-based regional influenza tracking methods, and it has achieved 30% error reduction over the best alternative method that we numerically tested for the period of March 2009 to March 2018. ARGO2 is reliable and robust, with the flexibility to incorporate additional information from other sources and resolutions, making it a powerful tool for regional influenza tracking, and potentially for tracking other social, economic, or public health events at the regional or local level.


Subject(s)
Data Mining , Epidemiological Monitoring , Influenza, Human/epidemiology , Internet , Humans
15.
J Phys Chem B ; 111(9): 2377-84, 2007 Mar 08.
Article in English | MEDLINE | ID: mdl-17288472

ABSTRACT

The intermittent emission of fluorescent light from single enzymes, quantum dots, and other nanoscale systems is often characterized by statistical correlations in the emitted signal. A one-dimensional model of such correlations in enzymes, based on a model of protein conformational fluctuations developed by Kou and Xie (Phys. Rev. Lett. 2004, 93, 180603), is formulated in the present paper in terms of the dynamics of a particle moving stochastically between "on" and "off" states under the action of fractional Gaussian noise. The model yields predictions for the short and long time behavior of the following quantities: the time correlation function, C(t), of the fluctuations of the signal intensity, the distribution, f(t), of time intervals between intensity fluctuations, and the Mandel parameter, Q(t), describing the extent of bunching or anti-bunching in the signal. At short times, C(t) and f(t) are found to decay exponentially, while, at long times, they are found to decay as power laws, the exponents being functions solely of the nature of the temporal correlations in the noise. The results are in good qualitative agreement with results from single-molecule experiments on fluorescence intermittency in the enzyme cholesterol oxidase carried out by Xie and co-workers (Science 1998, 282, 1877). The Mandel parameter, Q(t), for this model is positive at short and long times, indicating super-Poisson statistics in these limits, consistent with bunching of the fluorescent signal.


Subject(s)
Enzymes/chemistry , Spectrometry, Fluorescence/methods , Biophysics/methods , Chemistry, Physical/methods , Fluorescent Dyes/pharmacology , Fourier Analysis , Models, Statistical , Normal Distribution , Poisson Distribution , Protein Conformation , Time Factors
16.
J Phys Chem B ; 110(41): 20093-7, 2006 Oct 19.
Article in English | MEDLINE | ID: mdl-17034179

ABSTRACT

Enzymes are dynamic entities: both their conformation and catalytic activity fluctuate over time. When such fluctuations are relatively fast, it is not surprising that the classical Michaelis-Menten (MM) relationship between the steady-state enzymatic velocity and the substrate concentration still holds. However, recent single-molecule experiments have shown that this is the case even for an enzyme whose catalytic activity fluctuates on the 10(-4)-10 s range. The purpose of this paper is to examine various scenarios in which slowly fluctuating enzymes would still obey the MM relationship. Specifically, we consider (1) the quasi-static condition (e.g., the conformational fluctuation of the enzyme-substrate complex is much slower than binding, catalysis, and the conformational fluctuations of the free enzyme), (2) the quasi-equilibrium condition (when the substrate dissociation is much faster than catalysis, irrespective of the time scales or amplitudes of conformational fluctuations), and (3) the conformational-equilibrium condition (when the dissociation and catalytic rates depend on the conformational coordinate in the same way). For each of these scenarios, the physical meaning of the apparent Michaelis constant and catalytic rate constant is provided. Finally, as an example, the theoretical analysis of a recent single-molecule enzyme assay is considered in light of the perspectives presented in this paper.


Subject(s)
Biophysics/methods , Chemistry, Physical/methods , Enzymes/chemistry , Catalysis , Diffusion , Kinetics , Models, Chemical , Models, Molecular , Models, Statistical , Models, Theoretical , Molecular Conformation , Thermodynamics , beta-Galactosidase/chemistry
17.
J Am Stat Assoc ; 111(513): 314-330, 2016.
Article in English | MEDLINE | ID: mdl-27212739

ABSTRACT

This paper studies the estimation of stepwise signal. To determine the number and locations of change-points of the stepwise signal, we formulate a maximum marginal likelihood estimator, which can be computed with a quadratic cost using dynamic programming. We carry out extensive investigation on the choice of the prior distribution and study the asymptotic properties of the maximum marginal likelihood estimator. We propose to treat each possible set of change-points equally and adopt an empirical Bayes approach to specify the prior distribution of segment parameters. Detailed simulation study is performed to compare the effectiveness of this method with other existing methods. We demonstrate our method on single-molecule enzyme reaction data and on DNA array CGH data. Our study shows that this method is applicable to a wide range of models and offers appealing results in practice.

18.
J Am Stat Assoc ; 111(515): 951-966, 2016.
Article in English | MEDLINE | ID: mdl-28943680

ABSTRACT

To maintain proper cellular functions, over 50% of proteins encoded in the genome need to be transported to cellular membranes. The molecular mechanism behind such a process, often referred to as protein targeting, is not well understood. Single-molecule experiments are designed to unveil the detailed mechanisms and reveal the functions of different molecular machineries involved in the process. The experimental data consist of hundreds of stochastic time traces from the fluorescence recordings of the experimental system. We introduce a Bayesian hierarchical model on top of hidden Markov models (HMMs) to analyze these data and use the statistical results to answer the biological questions. In addition to resolving the biological puzzles and delineating the regulating roles of different molecular complexes, our statistical results enable us to propose a more detailed mechanism for the late stages of the protein targeting process.

19.
J Phys Chem B ; 109(41): 19068-81, 2005 Oct 20.
Article in English | MEDLINE | ID: mdl-16853459

ABSTRACT

This paper summarizes our present theoretical understanding of single-molecule kinetics associated with the Michaelis-Menten mechanism of enzymatic reactions. Single-molecule enzymatic turnover experiments typically measure the probability density f(t) of the stochastic waiting time t for individual turnovers. While f(t) can be reconciled with ensemble kinetics, it contains more information than the ensemble data; in particular, it provides crucial information on dynamic disorder, the apparent fluctuation of the catalytic rates due to the interconversion among the enzyme's conformers with different catalytic rate constants. In the presence of dynamic disorder, f(t) exhibits a highly stretched multiexponential decay at high substrate concentrations and a monoexponential decay at low substrate concentrations. We derive a single-molecule Michaelis-Menten equation for the reciprocal of the first moment of f(t), 1/, which shows a hyperbolic dependence on the substrate concentration [S], similar to the ensemble enzymatic velocity. We prove that this single-molecule Michaelis-Menten equation holds under many conditions, in particular when the intercoversion rates among different enzyme conformers are slower than the catalytic rate. However, unlike the conventional interpretation, the apparent catalytic rate constant and the apparent Michaelis constant in this single-molecule Michaelis-Menten equation are complicated functions of the catalytic rate constants of individual conformers. We also suggest that the randomness parameter r, defined as <(t - )2> / t2, can serve as an indicator for dynamic disorder in the catalytic step of the enzymatic reaction, as it becomes larger than unity at high substrate concentrations in the presence of dynamic disorder.


Subject(s)
Algorithms , Enzymes/chemistry , Kinetics , Models, Chemical , Models, Statistical
20.
Annu Rev Stat Appl ; 1: 465-492, 2014 Jan 01.
Article in English | MEDLINE | ID: mdl-25009825

ABSTRACT

Since the universal acceptance of atoms and molecules as the fundamental constituents of matter in the early twentieth century, molecular physics, chemistry and molecular biology have all experienced major theoretical breakthroughs. To be able to actually "see" biological macromolecules, one at a time in action, one has to wait until the 1970s. Since then the field of single-molecule biophysics has witnessed extensive growth both in experiments and theory. A distinct feature of single-molecule biophysics is that the motions and interactions of molecules and the transformation of molecular species are necessarily described in the language of stochastic processes, whether one investigates equilibrium or nonequilibrium living behavior. For laboratory measurements following a biological process, if it is sampled over time on individual participating molecules, then the analysis of experimental data naturally calls for the inference of stochastic processes. The theoretical and experimental developments of single-molecule biophysics thus present interesting questions and unique opportunity for applied statisticians and probabilists. In this article, we review some important statistical developments in connection to single-molecule biophysics, emphasizing the application of stochastic-process theory and the statistical questions arising from modeling and analyzing experimental data.

SELECTION OF CITATIONS
SEARCH DETAIL