Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Extremes (Boston) ; 26(3): 573-594, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37581203

RESUMO

Confounding variables are a recurrent challenge for causal discovery and inference. In many situations, complex causal mechanisms only manifest themselves in extreme events, or take simpler forms in the extremes. Stimulated by data on extreme river flows and precipitation, we introduce a new causal discovery methodology for heavy-tailed variables that allows the effect of a known potential confounder to be almost entirely removed when the variables have comparable tails, and also decreases it sufficiently to enable correct causal inference when the confounder has a heavier tail. We also introduce a new parametric estimator for the existing causal tail coefficient and a permutation test. Simulations show that the methods work well and the ideas are applied to the motivating dataset. Supplementary Information: The online version contains supplementary material available at 10.1007/s10687-022-00456-4.

2.
R Soc Open Sci ; 8(9): 202097, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34631116

RESUMO

We use a combination of extreme value statistics, survival analysis and computer-intensive methods to analyse the mortality of Italian and French semi-supercentenarians. After accounting for the effects of the sampling frame, extreme-value modelling leads to the conclusion that constant force of mortality beyond 108 years describes the data well and there is no evidence of differences between countries and cohorts. These findings are consistent with use of a Gompertz model and with previous analysis of the International Database on Longevity and suggest that any physical upper bound for the human lifespan is so large that it is unlikely to be approached. Power calculations make it implausible that there is an upper bound below 130 years. There is no evidence of differences in survival between women and men after age 108 in the Italian data and the International Database on Longevity, but survival is lower for men in the French data.

3.
R Soc Open Sci ; 7(7): 200462, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32874640

RESUMO

If an artificial intelligence aims to maximize risk-adjusted return, then under mild conditions it is disproportionately likely to pick an unethical strategy unless the objective function allows sufficiently for this risk. Even if the proportion η of available unethical strategies is small, the probability p U of picking an unethical strategy can become large; indeed, unless returns are fat-tailed p U tends to unity as the strategy space becomes large. We define an unethical odds ratio, Υ (capital upsilon), that allows us to calculate p U from η, and we derive a simple formula for the limit of Υ as the strategy space becomes large. We discuss the estimation of Υ and p U in finite cases and how to deal with infinite strategy spaces. We show how the principle can be used to help detect unethical strategies and to estimate η. Finally we sketch some policy implications of this work.

4.
PLoS Comput Biol ; 16(6): e1007882, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32492067

RESUMO

Molecular quantitative trait locus (QTL) analyses are increasingly popular to explore the genetic architecture of complex traits, but existing studies do not leverage shared regulatory patterns and suffer from a large multiplicity burden, which hampers the detection of weak signals such as trans associations. Here, we present a fully multivariate proteomic QTL (pQTL) analysis performed with our recently proposed Bayesian method LOCUS on data from two clinical cohorts, with plasma protein levels quantified by mass-spectrometry and aptamer-based assays. Our two-stage study identifies 136 pQTL associations in the first cohort, of which >80% replicate in the second independent cohort and have significant enrichment with functional genomic elements and disease risk loci. Moreover, 78% of the pQTLs whose protein abundance was quantified by both proteomic techniques are confirmed across assays. Our thorough comparisons with standard univariate QTL mapping on (1) these data and (2) synthetic data emulating the real data show how LOCUS borrows strength across correlated protein levels and markers on a genome-wide scale to effectively increase statistical power. Notably, 15% of the pQTLs uncovered by LOCUS would be missed by the univariate approach, including several trans and pleiotropic hits with successful independent validation. Finally, the analysis of extensive clinical data from the two cohorts indicates that the genetically-driven proteins identified by LOCUS are enriched in associations with low-grade inflammation, insulin resistance and dyslipidemia and might therefore act as endophenotypes for metabolic diseases. While considerations on the clinical role of the pQTLs are beyond the scope of our work, these findings generate useful hypotheses to be explored in future research; all results are accessible online from our searchable database. Thanks to its efficient variational Bayes implementation, LOCUS can analyze jointly thousands of traits and millions of markers. Its applicability goes beyond pQTL studies, opening new perspectives for large-scale genome-wide association and QTL analyses. Diet, Obesity and Genes (DiOGenes) trial registration number: NCT00390637.


Assuntos
Teorema de Bayes , Proteínas Sanguíneas/genética , Locos de Características Quantitativas , Biomarcadores/sangue , Estudo de Associação Genômica Ampla , Humanos
5.
Ann Appl Stat ; 14(2): 905-928, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34992707

RESUMO

We tackle modelling and inference for variable selection in regression problems with many predictors and many responses. We focus on detecting hotspots, that is, predictors associated with several responses. Such a task is critical in statistical genetics, as hotspot genetic variants shape the architecture of the genome by controlling the expression of many genes and may initiate decisive functional mechanisms underlying disease endpoints. Existing hierarchical regression approaches designed to model hotspots suffer from two limitations: their discrimination of hotspots is sensitive to the choice of top-level scale parameters for the propensity of predictors to be hotspots, and they do not scale to large predictor and response vectors, for example, of dimensions 103-105 in genetic applications. We address these shortcomings by introducing a flexible hierarchical regression framework that is tailored to the detection of hotspots and scalable to the above dimensions. Our proposal implements a fully Bayesian model for hotspots based on the horseshoe shrinkage prior. Its global-local formulation shrinks noise globally and, hence, accommodates the highly sparse nature of genetic analyses while being robust to individual signals, thus leaving the effects of hotspots unshrunk. Inference is carried out using a fast variational algorithm coupled with a novel simulated annealing procedure that allows efficient exploration of multimodal distributions.

6.
Biostatistics ; 19(2): 153-168, 2018 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-29106444

RESUMO

Independence of genes is commonly but incorrectly assumed in microarray data analysis; rather, genes are activated in co-regulated sets referred to as modules. In this article, we develop an automatic method to define modules common to multiple independent studies. We use an empirical Bayes procedure to estimate a sparse correlation matrix for all studies, identify modules by clustering, and develop an extreme-value-based method to detect so-called scattered genes, which do not belong to any module. The resulting algorithm is very fast and produces accurate modules in simulation studies. Application to real data identifies modules with significant enrichment and results in a huge dimension reduction, which can alleviate the computational burden of further analyses.


Assuntos
Bioestatística/métodos , Biologia Computacional/métodos , Expressão Gênica , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Teorema de Bayes , Simulação por Computador , Humanos
7.
Biostatistics ; 18(4): 618-636, 2017 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-28334312

RESUMO

Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modeling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson and others (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes.


Assuntos
Estudos de Associação Genética/métodos , Variação Genética , Cadeias de Markov , Modelos Estatísticos , Método de Monte Carlo , Humanos
8.
Biostatistics ; 16(4): 686-700, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25987649

RESUMO

Meta-analysis of microarray studies to produce an overall gene list is relatively straightforward when complete data are available. When some studies lack information-providing only a ranked list of genes, for example-it is common to reduce all studies to ranked lists prior to combining them. Since this entails a loss of information, we consider a hierarchical Bayes approach to meta-analysis using different types of information from different studies: the full data matrix, summary statistics, or ranks. The model uses an informative prior for the parameter of interest to aid the detection of differentially expressed genes. Simulations show that the new approach can give substantial power gains compared with classical meta-analysis and list aggregation methods. A meta-analysis of 11 published studies with different data types identifies genes known to be involved in ovarian cancer and shows significant enrichment.


Assuntos
Expressão Gênica , Metanálise como Assunto , Análise em Microsséries/métodos , Modelos Estatísticos , Neoplasias Ovarianas/genética , Feminino , Humanos
9.
Biomed Opt Express ; 3(6): 1365-80, 2012 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-22741082

RESUMO

Diabetes is characterized by hyperglycemia that can result from the loss of pancreatic insulin secreting ß-cells in the islets of Langerhans. We analyzed ex vivo the entire gastric and duodenal lobes of a murine pancreas using extended-focus Optical Coherence Microscopy (xfOCM). To identify and quantify the islets of Langerhans observed in xfOCM tomograms we implemented an active contour algorithm based on the level set method. We show that xfOCM reveals a three-dimensional islet distribution consistent with Optical Projection Tomography, albeit with a higher resolution that also enables the detection of the smallest islets (≤ 8000 µm(3)). Although this category of the smallest islets represents only a negligible volume compared to the total ß-cell volume, a recent study suggests that these islets, located at the periphery, are the first to be destroyed when type I diabetes develops. Our results underline the capability of xfOCM to contribute to the understanding of the development of diabetes, especially when considering islet volume distribution instead of the total ß-cell volume only.

10.
J Theor Biol ; 259(3): 523-32, 2009 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-19409907

RESUMO

The modelling of prey-predator interactions is of major importance for the understanding of population dynamics. Classically, these interactions are modelled using ordinary differential equations, but this approach has the drawbacks of assuming continuous population variables and of being deterministic. We propose a general approach to stochastic modelling based on the concept of functional response for a prey depletion process with a constant number of predators. Our model could involve any kind of functional response, and permits a likelihood-based approach to statistical modelling and stable computation using matrix exponentials. To illustrate the method we use the Holling-Juliano functional response and compare the outcomes of our model with a deterministic counterpart considered by Schenk and Bacher [2002. Functional response of a generalist insect predator to one of its prey species in the field. Journal of Animal Ecology 71 (3), 524-531], who observed the depletion of Cassida rubiginosa due to its exclusive predator, Polistes dominulus. The predation was found to be Holling type III, reflecting the ability of the predator to regulate its prey. Our approach corroborates this result, but suggests that the prey depletion census should have been performed more often, and that predation features were significantly different between the two years for which data are available.


Assuntos
Ecossistema , Insetos/fisiologia , Comportamento Predatório , Processos Estocásticos , Animais , Cadeia Alimentar , Modelos Biológicos , Dinâmica Populacional
11.
Theor Appl Genet ; 115(7): 933-44, 2007 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17874063

RESUMO

Many quantitative genetic statistics are functions of variance components, for which a large number of replicates is needed for precise estimates and reliable measures of uncertainty, on which sound interpretation depends. Moreover, in large experiments the deaths of some individuals can occur, so methods for analysing such data need to be robust to missing values. We show how confidence intervals for narrow-sense heritability can be calculated in a nested full-sib/half-sib breeding design (males crossed with several females) in the presence of missing values. Simulations indicate that the method provides accurate results, and that estimator uncertainty is lowest for sampling designs with many males relative to the number of females per male, and with more females per male than progenies per female. Missing data generally had little influence on estimator accuracy, thus suggesting that the overall number of observations should be increased even if this results in unbalanced data. We also suggest the use of parametrically simulated data for prior investigation of the accuracy of planned experiments. Together with the proposed confidence intervals an informed decision on the optimal sampling design is possible, which allows efficient allocation of resources.


Assuntos
Genética Populacional/estatística & dados numéricos , Modelos Genéticos , Modelos Estatísticos , Análise de Variância , Intervalos de Confiança , Feminino , Humanos , Masculino
12.
Plant Physiol ; 143(4): 1484-92, 2007 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-17277092

RESUMO

We evaluated the application of gas chromatography-mass spectrometry metabolic fingerprinting to classify forward genetic mutants with similar phenotypes. Mutations affecting distinct metabolic or signaling pathways can result in common phenotypic traits that are used to identify mutants in genetic screens. Measurement of a broad range of metabolites provides information about the underlying processes affected in such mutants. Metabolite profiles of Arabidopsis (Arabidopsis thaliana) mutants defective in starch metabolism and uncharacterized mutants displaying a starch-excess phenotype were compared. Each genotype displayed a unique fingerprint. Statistical methods grouped the mutants robustly into distinct classes. Determining the genes mutated in three uncharacterized mutants confirmed that those clustering with known mutants were genuinely defective in starch metabolism. A mutant that clustered away from the known mutants was defective in the circadian clock and had a pleiotropic starch-excess phenotype. These results indicate that metabolic fingerprinting is a powerful tool that can rapidly classify forward genetic mutants and streamline the process of gene discovery.


Assuntos
Arabidopsis/classificação , Mutação , Arabidopsis/genética , Arabidopsis/metabolismo , Cromatografia Gasosa-Espectrometria de Massas , Fenótipo , Amido/biossíntese
13.
Pain ; 122(1-2): 14.e1-14, 2006 May.
Artigo em Inglês | MEDLINE | ID: mdl-16542774

RESUMO

Experimental models of peripheral nerve injury have been developed to study mechanisms of neuropathic pain. In the spared nerve injury (SNI) model in rats, the common peroneal and tibial nerves are injured, producing consistent and reproducible pain hypersensitivity in the territory of the spared sural nerve. In this study, we investigated whether SNI in mice is also a valid model system for neuropathic pain. SNI results in a significant decrease in withdrawal threshold in SNI-operated mice. The effect is very consistent between animals and persists for the four weeks of the study. We also determined the relative frequency of paw withdrawal for each of a series of 11 von Frey hairs. Analysis of response frequency using a mixed-effects model that integrates all variables (nerve injury, paw, gender, and time) shows a very stable effect of SNI over time and also reveals subtle divergences between variables, including gender-based differences in mechanical sensitivity. We tested two variants of the SNI model and found that injuring the tibial nerve alone induces mechanical hypersensitivity, while injuring the common peroneal and sural nerves together does not induce any significant increase in mechanical sensitivity in the territory of the spared tibial nerve. SNI induces a mechanical allodynia-like response in mice and we believe that our improved method of assessment and data analysis will reveal additional internal and external variability factors in models of persistent pain. Use of this model in genetically altered mice should be very effective for determining the mechanisms involved in neuropathic pain.


Assuntos
Modelos Animais de Doenças , Hiperalgesia/etiologia , Hiperalgesia/fisiopatologia , Limiar da Dor , Neuropatia Ciática/complicações , Neuropatia Ciática/fisiopatologia , Animais , Feminino , Masculino , Camundongos , Camundongos Endogâmicos C57BL
14.
Stat Med ; 22(24): 3805-21, 2003 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-14673940

RESUMO

The intraclass correlation coefficient rho plays a key role in the design of cluster randomized trials. Estimates of rho obtained from previous cluster trials and used to inform sample size calculation in planned trials may be imprecise due to the typically small numbers of clusters in such studies. It may be useful to quantify this imprecision. This study used simulation to compare different methods for assigning bootstrap confidence intervals to rho for continuous outcomes from a balanced design. Data were simulated for combinations of numbers of clusters (10, 30, 50), intraclass correlation coefficients (0.001, 0.01, 0.05, 0.3) and outcome distributions (normal, non-normal continuous). The basic, bootstrap-t, percentile, bias corrected and bias corrected accelerated bootstrap intervals were compared with new methods using the basic and bootstrap-t intervals applied to a variance stabilizing transformation of rho. The standard bootstrap methods provided coverage levels for 95 per cent intervals that were markedly lower than the nominal level for data sets with only 10 clusters, and only provided close to 95 per cent coverage when there were 50 clusters. Application of the bootstrap-t method to the variance stabilizing transformation of rho improved upon the performance of the standard bootstrap methods, providing close to nominal coverage.


Assuntos
Análise por Conglomerados , Intervalos de Confiança , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Estatísticas não Paramétricas , Modelos Estatísticos , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA