Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Entropy (Basel) ; 23(6)2021 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-34072055

RESUMO

In disease modeling, a key statistical problem is the estimation of lower and upper tail probabilities of health events from given data sets of small size and limited range. Assuming such constraints, we describe a computational framework for the systematic fusion of observations from multiple sources to compute tail probabilities that could not be obtained otherwise due to a lack of lower or upper tail data. The estimation of multivariate lower and upper tail probabilities from a given small reference data set that lacks complete information about such tail data is addressed in terms of pertussis case count data. Fusion of data from multiple sources in conjunction with the density ratio model is used to give probability estimates that are non-obtainable from the empirical distribution. Based on a density ratio model with variable tilts, we first present a univariate fit and, subsequently, improve it with a multivariate extension. In the multivariate analysis, we selected the best model in terms of the Akaike Information Criterion (AIC). Regional prediction, in Washington state, of the number of pertussis cases is approached by providing joint probabilities using fused data from several relatively small samples following the selected density ratio model. The model is validated by a graphical goodness-of-fit plot comparing the estimated reference distribution obtained from the fused data with that of the empirical distribution obtained from the reference sample only.

2.
Stat Med ; 35(18): 3229-40, 2016 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-26891189

RESUMO

Often in food safety and bio-surveillance it is desirable to estimate the probability that a contaminant or a function thereof exceeds an unsafe high threshold. The probability or chance in question is very small. To estimate such a probability, we need information about large values. In many cases, the data do not contain information about exceedingly large contamination levels, which ostensibly renders the problem insolvable. A solution is suggested whereby more information about small tail probabilities are obtained by combining the real data with computer-generated data repeatedly. This method provides short yet reliable interval estimates based on moderately large samples. An illustration is provided in terms of lead exposure data. Copyright © 2016 John Wiley & Sons, Ltd.


Assuntos
Inocuidade dos Alimentos , Probabilidade
3.
Stat Med ; 34(11): 1940-52, 2015 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-25490870

RESUMO

In US states with small subpopulations, the observed mortality rates are often zero, particularly among young ages. Because in life tables, death rates are reported mostly on a log scale, zero mortality rates are problematic. To overcome the observed zero death rates problem, appropriate probability models are used. Using these models, observed zero mortality rates are replaced by the corresponding expected values. This enables logarithmic transformations and, in some cases, the fitting of the eight-parameter Heligman-Pollard model to produce mortality estimates for ages 0-130 years, a procedure illustrated in terms of mortality data from several states.


Assuntos
Tábuas de Vida , Modelos Estatísticos , Mortalidade/tendências , Adolescente , Adulto , Idoso , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Recém-Nascido , Expectativa de Vida , Masculino , Pessoa de Meia-Idade , Probabilidade , Estados Unidos/epidemiologia
4.
J Stat Theory Pract ; 15(2): 25, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33495693

RESUMO

Synthetic data, when properly used, can enhance patterns in real data and thus provide insights into different problems. Here, the estimation of tail probabilities of rare events from a moderately large number of observations is considered. The problem is approached by a large number of augmentations or fusions of the real data with computer-generated synthetic samples. The tail probability of interest is approximated by subsequences created by a novel iterative process. The estimates are found to be quite precise.

5.
Stat Med ; 28(16): 2147-59, 2009 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-19472167

RESUMO

Bivariate semiparametric inference based on a two-dimensional density ratio model is discussed and applied in testing the significance of risk factors regarding testicular germ cell tumors. The results from the joint analysis of height and weight data from a case-control study show that jointly these two factors are significant, whereas body mass index, which is a function of height and weight, is not a significant risk factor.


Assuntos
Biometria/métodos , Neoplasias Embrionárias de Células Germinativas/etiologia , Neoplasias Testiculares/etiologia , Estatura , Peso Corporal , Estudos de Casos e Controles , Interpretação Estatística de Dados , Humanos , Masculino , Modelos Estatísticos , Fatores de Risco
6.
Int J Health Geogr ; 8: 73, 2009 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-20043837

RESUMO

BACKGROUND: A semiparametric density ratio method which borrows strength from two or more samples can be applied to moving window of variable size in cluster detection. The method requires neither the prior knowledge of the underlying distribution nor the number of cases before scanning. In this paper, the semiparametric cluster detection procedure is combined with Storey's q-value, a type of controlling false discovery rate (FDR) method, to take into account the multiple testing problem induced by the overlapping scanning windows. RESULTS: It is shown by simulations that for binary data, using Kulldorff's Northeastern benchmark data, the semiparametric method and Kulldorff's method performs similarly well. When the data are not binary, the semiparametric methodology still works in many cases, but Kulldorff's method requires the choices of a correct probability model, namely the correct scan statistic, in order to achieve comparable power as the semiparametric method achieves. Kulldorff's method with an inappropriate probability model may lose power. CONCLUSIONS: The semiparametric method proposed in the paper can achieve good power when detecting localized cluster. The method does not require a specific distributional assumption other than the tilt function. In addition, it is possible to adapt other scan schemes (e.g., elliptic spatial scan, flexible shape scan) to search for clusters as well.


Assuntos
Interpretação Estatística de Dados , Distribuição de Poisson , Conglomerados Espaço-Temporais , Biometria/métodos , Criança , Simulação por Computador , Humanos , Leucemia/induzido quimicamente , Leucemia/epidemiologia , Funções Verossimilhança , Masculino , Exposição Paterna , Estados Unidos/epidemiologia
7.
J Stat Theory Pract ; 8(3): 444-459, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-38650966

RESUMO

The probability that mortality from certain causes exceeds high thresholds is addressed. An out-of-sample fusion method is presented where an original real data sample is fused or combined with independent computer-generated samples in the estimation of exceedance probabilities assuming a density ratio model. Since the size of the combined sample of real and artificial data is larger than that of the real sample, the fused sample produces short confidence intervals relative to traditional methods. Numerical results show that the method maintains good coverage even for some misspecified cases.

8.
Genomics ; 89(2): 300-5, 2007 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-17125967

RESUMO

Escherichia coli K (JM109) and E. coli B (BL21) are strains used routinely for recombinant protein production. These two strains grow and respond differently to environmental factors such as glucose and oxygen concentration. The differences have been attributed to differential expression of individual genes that constitute certain metabolic pathways that are part of the central carbon metabolism. By implementing a semiparametric algorithm, which is based on a density ratio model, it was possible to compare and quantify the expression patterns of groups of genes involved in several central carbon metabolic pathways. The groups comprising the glyoxylate shunt, TCA cycle, fatty acid, and gluconeogenesis and anaplerotic pathways were expressed differently between the two strains, whereas no differences were apparent for the groups comprising either glycolysis or the pentose phosphate pathway. These results further characterized differences between the two E. coli strains and illustrated the potency of the semiparametric algorithm.


Assuntos
Carbono/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Algoritmos , Ciclo do Ácido Cítrico/genética , Escherichia coli/classificação , Ácidos Graxos/metabolismo , Gluconeogênese/genética , Glucose/metabolismo , Glicólise/genética , Glioxilatos/metabolismo , Modelos Biológicos , Modelos Genéticos , Modelos Estatísticos , Via de Pentose Fosfato/genética , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA