Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 40(2)2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38364309

RESUMO

MOTIVATION: Estimating the individual inbreeding coefficient and pairwise kinship is an important problem in human genetics (e.g. in disease mapping) and in animal and plant genetics (e.g. inbreeding design). Existing methods, such as sample correlation-based genetic relationship matrix, KING, and UKin, are either biased, or not able to estimate inbreeding coefficients, or produce a large proportion of negative estimates that are difficult to interpret. This limitation of existing methods is partly due to failure to explicitly model inbreeding. Since all humans are inbred to various degrees by virtue of shared ancestries, it is prudent to account for inbreeding when inferring kinship between individuals. RESULTS: We present "Kindred," an approach that estimates inbreeding and kinship by modeling latent identity-by-descent states that accounts for all possible allele sharing-including inbreeding-between two individuals. Kindred used non-negative least squares method to fit the model, which not only increases computation efficiency compared to the maximum likelihood method, but also guarantees non-negativity of the kinship estimates. Through simulation, we demonstrate the high accuracy and non-negativity of kinship estimates by Kindred. By selecting a subset of SNPs that are similar in allele frequencies across different continental populations, Kindred can accurately estimate kinship between admixed samples. In addition, we demonstrate that the realized kinship matrix estimated by Kindred is effective in reducing genomic control values via linear mixed model in genome-wide association studies. Finally, we demonstrate that Kindred produces sensible heritability estimates on an Australian height dataset. AVAILABILITY AND IMPLEMENTATION: Kindred is implemented in C with multi-threading. It takes vcf file or stream as input and works seamlessly with bcftools. Kindred is freely available at https://github.com/haplotype/kindred.


Assuntos
Estudo de Associação Genômica Ampla , Endogamia , Animais , Humanos , Austrália , Genoma , Frequência do Gene , Linhagem
2.
Biometrics ; 80(3)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-39101549

RESUMO

Many existing methodologies for analyzing spatiotemporal point patterns are developed based on the assumption of stationarity in both space and time for the second-order intensity or pair correlation. In practice, however, such an assumption often lacks validity or proves to be unrealistic. In this paper, we propose a novel and flexible nonparametric approach for estimating the second-order characteristics of spatiotemporal point processes, accommodating non-stationary temporal correlations. Our proposed method employs kernel smoothing and effectively accounts for spatial and temporal correlations differently. Under a spatially increasing-domain asymptotic framework, we establish consistency of the proposed estimators, which can be constructed using different first-order intensity estimators to enhance practicality. Simulation results reveal that our method, in comparison with existing approaches, significantly improves statistical efficiency. An application to a COVID-19 dataset further illustrates the flexibility and interpretability of our procedure.


Assuntos
COVID-19 , Simulação por Computador , Análise Espaço-Temporal , Humanos , Estatísticas não Paramétricas , Modelos Estatísticos , SARS-CoV-2 , Biometria/métodos , Interpretação Estatística de Dados
3.
Genome Res ; 30(9): 1364-1375, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32883749

RESUMO

We present Nubeam (nucleotide be a matrix) as a novel reference-free approach to analyze short sequencing reads. Nubeam represents nucleotides by matrices, transforms a read into a product of matrices, and assigns numbers to reads based on the product matrix. Nubeam capitalizes on the noncommutative property of matrix multiplication, such that different reads are assigned different numbers and similar reads similar numbers. A sample, which is a collection of reads, becomes a collection of numbers that form an empirical distribution. We demonstrate that the genetic difference between samples can be quantified by the distance between empirical distributions. Nubeam includes the k-mer method as a special case, but unlike the k-mer method, it is convenient for Nubeam to account for GC bias and nucleotide quality. As a reference-free approach, Nubeam avoids reference bias and mapping bias, and can work with organisms without reference genomes. Thus, Nubeam is ideal to analyze data sets from metagenomics whole genome shotgun (WGS) sequencing, where the amount of unmapped reads is substantial. When applied to a WGS sequencing data set to quantify distances between metagenomics samples from various human body habitats, Nubeam recapitulates findings made by mapping-based methods and sheds light on contributions of unmapped reads. Nubeam is also useful in analyzing 16S rRNA sequencing data, which is a more prevalent type of data set in metagenomics studies. In our analysis, Nubeam recapitulated the findings that natural microbiota in mouse gut are resilient under challenges, and Nubeam detected differences in vaginal microbiota between cases of polycystic ovary syndrome and healthy controls.


Assuntos
Metagenômica/métodos , Sequenciamento Completo do Genoma/métodos , Animais , Feminino , Microbioma Gastrointestinal , Humanos , Camundongos , RNA Ribossômico 16S , Análise de Sequência de RNA/métodos , Vagina/microbiologia
4.
Bioinformatics ; 36(10): 3254-3256, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32091581

RESUMO

SUMMARY: We present Nubeam-dedup, a fast and RAM-efficient tool to de-duplicate sequencing reads without reference genome. Nubeam-dedup represents nucleotides by matrices, transforms reads into products of matrices, and based on which assigns a unique number to a read. Thus, duplicate reads can be efficiently removed by using a collisionless hash function. Compared with other state-of-the-art reference-free tools, Nubeam-dedup uses 50-70% of CPU time and 10-15% of RAM. AVAILABILITY AND IMPLEMENTATION: Source code in C++ and manual are available at https://github.com/daihang16/nubeamdedup and https://haplotype.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Algoritmos , Genoma , Análise de Sequência de DNA
5.
Genet Med ; 22(2): 450, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-31822850

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

6.
Genet Med ; 22(2): 301-308, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31467446

RESUMO

PURPOSE: Fetal fraction (FF) is the percent of cell-free DNA (cfDNA) in the mother's peripheral blood that is of fetal origin, which plays a pivotal role in noninvasive prenatal screening (NIPS). We present a method that can reliably estimate FFs by examining autosome single-nucleotide polymorphisms (SNPs). METHODS: Even at a very low sequencing depth, there are plenty of SNPs covered by more than one read. At those SNPs, we define read heterozygosity and demonstrate that the percent of read heterozygosity is a function of FF, which allows FF to be inferred. RESULTS: We first demonstrated the effectiveness of our method in inferring FF. Then we used the inferred FF as an informative alternative prior to computing Bayes factors to test for aneuploidy, and observed better power than the Z-test. In analysis of clinical samples, we were able to identify female-male twins thanks to the accurate FF inference. CONCLUSION: Knowing FF improves efficacy of NIPS. It brings a powerful Bayesian method, allows "no call" for samples with small FFs, renders screening for XXY syndrome simpler, and permits an adaptive design to sequence at a higher depth for samples with small FFs.


Assuntos
Ácidos Nucleicos Livres/análise , Desenvolvimento Fetal/genética , Teste Pré-Natal não Invasivo/métodos , Aberrações Cromossômicas , Feminino , Feto , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Polimorfismo de Nucleotídeo Único/genética , Gravidez , Cuidado Pré-Natal , Diagnóstico Pré-Natal/métodos , Análise de Sequência de DNA/métodos
7.
PLoS Genet ; 12(2): e1005847, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26863142

RESUMO

Mexicans are a recent admixture of Amerindians, Europeans, and Africans. We performed local ancestry analysis of Mexican samples from two genome-wide association studies obtained from dbGaP, and discovered that at the MHC region Mexicans have excessive African ancestral alleles compared to the rest of the genome, which is the hallmark of recent selection for admixed samples. The estimated selection coefficients are 0.05 and 0.07 for two datasets, which put our finding among the strongest known selections observed in humans, namely, lactase selection in northern Europeans and sickle-cell trait in Africans. Using inaccurate Amerindian training samples was a major concern for the credibility of previously reported selection signals in Latinos. Taking advantage of the flexibility of our statistical model, we devised a model fitting technique that can learn Amerindian ancestral haplotype from the admixed samples, which allows us to infer local ancestries for Mexicans using only European and African training samples. The strong selection signal at the MHC remains without Amerindian training samples. Finally, we note that medical history studies suggest such a strong selection at MHC is plausible in Mexicans.


Assuntos
Pool Gênico , Complexo Principal de Histocompatibilidade/genética , Seleção Genética , População Negra/genética , Dosagem de Genes , Genealogia e Heráldica , Humanos , México , Análise de Componente Principal , População Branca/genética
8.
Genet Med ; 20(8): 817-824, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29120459

RESUMO

PURPOSE: Noninvasive prenatal screening (NIPS) sequences a mixture of the maternal and fetal cell-free DNA. Fetal trisomy can be detected by examining chromosomal dosages estimated from sequencing reads. The traditional method uses the Z-test, which compares a subject against a set of euploid controls, where the information of fetal fraction is not fully utilized. Here we present a Bayesian method that leverages informative priors on the fetal fraction. METHOD: Our Bayesian method combines the Z-test likelihood and informative priors of the fetal fraction, which are learned from the sex chromosomes, to compute Bayes factors. Bayesian framework can account for nongenetic risk factors through the prior odds, and our method can report individual positive/negative predictive values. RESULTS: Our Bayesian method has more power than the Z-test method. We analyzed 3,405 NIPS samples and spotted at least 9 (of 51) possible Z-test false positives. CONCLUSION: Bayesian NIPS is more powerful than the Z-test method, is able to account for nongenetic risk factors through prior odds, and can report individual positive/negative predictive values.


Assuntos
Teorema de Bayes , Diagnóstico Pré-Natal/métodos , Análise de Sequência de DNA/métodos , Adulto , China , Feminino , Feto , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Cadeias de Markov , Gravidez , Cuidado Pré-Natal
9.
J Theor Biol ; 455: 342-356, 2018 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-30053386

RESUMO

Chikungunya, dengue, and Zika viruses are all transmitted by Aedes aegypti and Aedes albopictus mosquito species, had been imported to Florida and caused local outbreaks. We propose a deterministic model to study the importation and local transmission of these mosquito-borne diseases. The purpose is to model and mimic the importation of these viruses to Florida via travelers, local infections in domestic mosquitoes by imported travelers, and finally non-travel related transmissions to local humans by infected local mosquitoes. As a case study, the model will be used to simulate the accumulative Zika cases in Florida. Since the disease system is driven by a continuing input of infections from outside sources, orthodox analytic methods based on the calculation of the basic reproduction number are inadequate to describe and predict their behavior. Via steady-state analysis and sensitivity analysis, effective control and prevention measures for these mosquito-borne diseases are tested.


Assuntos
Aedes/virologia , Surtos de Doenças , Modelos Biológicos , Mosquitos Vetores/virologia , Infecção por Zika virus , Zika virus , Animais , Febre de Chikungunya/epidemiologia , Febre de Chikungunya/transmissão , Vírus Chikungunya , Dengue/epidemiologia , Dengue/transmissão , Vírus da Dengue , Florida/epidemiologia , Humanos , Infecção por Zika virus/epidemiologia , Infecção por Zika virus/transmissão
10.
Biometrics ; 73(4): 1311-1320, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-28369699

RESUMO

Applications of spatial point processes for large and complex data sets with inhomogeneities as encountered, example, in tropical rain forest ecology call for estimation methods that are both statistically and computationally efficient. We propose a novel second-order quasi-likelihood procedure to estimate the parameters for a second-order intensity reweighted stationary spatial point process. Our approach is to derive first- and second-order estimating functions and then combine them linearly using appropriate weight functions. In the stationary case, we argue that the asymptotically optimal weight functions are respectively a constant and a function of lags between distinct locations in the observation window. This leads to a considerable gain in computational efficiency. We further exploit this simplification in the nonstationary case. Simulations show that, when compared with several existing approaches, our method can achieve significant gains in statistical efficiency. An application to a tropical rain forest data set further illustrates the advantages of our procedure.


Assuntos
Biometria , Ecologia , Modelos Estatísticos , Algoritmos , Simulação por Computador , Floresta Úmida
11.
Stat Med ; 35(24): 4306-4319, 2016 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-27241902

RESUMO

Recurrent event data are quite common in biomedical and epidemiological studies. A significant portion of these data also contain additional longitudinal information on surrogate markers. Previous studies have shown that popular methods using a Cox model with longitudinal outcomes as time-dependent covariates may lead to biased results, especially when longitudinal outcomes are measured with error. Hence, it is important to incorporate longitudinal information into the analysis properly. To achieve this, we model the correlation between longitudinal and recurrent event processes using latent random effect terms. We then propose a two-stage conditional estimating equation approach to model the rate function of recurrent event process conditioned on the observed longitudinal information. The performance of our proposed approach is evaluated through simulation. We also apply the approach to analyze cocaine addiction data collected by the University of Connecticut Health Center. The data include recurrent event information on cocaine relapse and longitudinal cocaine craving scores. Copyright © 2016 John Wiley & Sons, Ltd.


Assuntos
Confiabilidade dos Dados , Estudos Longitudinais , Transtornos Relacionados ao Uso de Cocaína , Humanos , Recidiva
12.
Stat Med ; 35(14): 2422-40, 2016 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-26790617

RESUMO

Spatiotemporal calibration of output from deterministic models is an increasingly popular tool to more accurately and efficiently estimate the true distribution of spatial and temporal processes. Current calibration techniques have focused on a single source of data on observed measurements of the process of interest that are both temporally and spatially dense. Additionally, these methods often calibrate deterministic models available in grid-cell format with pixel sizes small enough that the centroid of the pixel closely approximates the measurement for other points within the pixel. We develop a modeling strategy that allows us to simultaneously incorporate information from two sources of data on observed measurements of the process (that differ in their spatial and temporal resolutions) to calibrate estimates from a deterministic model available on a regular grid. This method not only improves estimates of the pollutant at the grid centroids but also refines the spatial resolution of the grid data. The modeling strategy is illustrated by calibrating and spatially refining daily estimates of ambient nitrogen dioxide concentration over Connecticut for 1994 from the Community Multiscale Air Quality model (temporally dense grid-cell estimates on a large pixel size) using observations from an epidemiologic study (spatially dense and temporally sparse) and Environmental Protection Agency monitoring stations (temporally dense and spatially sparse). Copyright © 2016 John Wiley & Sons, Ltd.


Assuntos
Modelos Estatísticos , Análise Espaço-Temporal , Poluentes Atmosféricos/análise , Poluição do Ar/análise , Poluição do Ar/estatística & dados numéricos , Bioestatística , Calibragem , Connecticut , Exposição Ambiental/análise , Exposição Ambiental/estatística & dados numéricos , Monitoramento Ambiental/estatística & dados numéricos , Humanos , Dióxido de Nitrogênio/análise , Estados Unidos , United States Environmental Protection Agency
13.
Biometrics ; 71(4): 1022-33, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26102478

RESUMO

We introduce a new multivariate product-shot-noise Cox process which is useful for modeling multi-species spatial point patterns with clustering intra-specific interactions and neutral, negative, or positive inter-specific interactions. The auto- and cross-pair correlation functions of the process can be obtained in closed analytical forms and approximate simulation of the process is straightforward. We use the proposed process to model interactions within and among five tree species in the Barro Colorado Island plot.


Assuntos
Modelos de Riscos Proporcionais , Biometria/métodos , Ecossistema , Modelos Lineares , Modelos Biológicos , Modelos Estatísticos , Análise Multivariada , Distribuição Normal , Especificidade da Espécie , Árvores
14.
Biometrics ; 71(1): 114-121, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25351292

RESUMO

We propose a novel statistical framework by supplementing case-control data with summary statistics on the population at risk for a subset of risk factors. Our approach is to first form two unbiased estimating equations, one based on the case-control data and the other on both the case data and the summary statistics, and then optimally combine them to derive another estimating equation to be used for the estimation. The proposed method is computationally simple and more efficient than standard approaches based on case-control data alone. We also establish asymptotic properties of the resulting estimator, and investigate its finite-sample performance through simulation. As a substantive application, we apply the proposed method to investigate risk factors for endometrial cancer, by using data from a recently completed population-based case-control study and summary statistics from the Behavioral Risk Factor Surveillance System, the Population Estimates Program of the US Census Bureau, and the Connecticut Department of Transportation.


Assuntos
Algoritmos , Estudos de Casos e Controles , Interpretação Estatística de Dados , Neoplasias do Endométrio/epidemiologia , Modelos Estatísticos , Medição de Risco/métodos , Simulação por Computador , Métodos Epidemiológicos , Feminino , Humanos , Armazenamento e Recuperação da Informação/métodos , Prevalência
15.
Res Sq ; 2023 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-37333260

RESUMO

Genome-wide DNA methylation studies have typically focused on quantitative assessments of CpG methylation at individual loci. Although methylation states at nearby CpG sites are known to be highly correlated, suggestive of an underlying coordinated regulatory network, the extent and consistency of inter-CpG methylation correlation across the genome, including variation between individuals, disease states, and tissues, remains unknown. Here, we leverage image conversion of correlation matrices to identify correlated methylation units (CMUs) across the genome, describe their variation across tissues, and annotate their regulatory potential using 35 public Illumina BeadChip datasets spanning more than 12,000 individuals and 26 different tissues. We identified a median of 18,125 CMUs genome-wide, occurring on all chromosomes and spanning a median of ~1 kb. Notably, 50% of CMUs had evidence of long-range correlation with other proximal CMUs. Although the size and number of CMUs varied across datasets, we observed strong intra-tissue consistency among CMUs, with those in testis encompassing those seen in most other tissues. Approximately 20% of CMUs were highly conserved across normal tissues (i.e. tissue independent), with 73 loci demonstrating strong correlation with non-adjacent CMUs on the same chromosome. These loci were enriched for CTCF and transcription factor binding sites, always found within putative TADs, and associated with the B compartment of chromosome folding. Finally, we observed significantly different, but highly consistent, patterns of CMU correlation between diseased and non-diseased states. Our first-generation, genome-wide, DNA methylation map suggests a highly coordinated CMU regulatory network that is sensitive to disruptions in its architecture.

16.
Stat Methods Med Res ; 31(2): 315-333, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34931910

RESUMO

Cocaine addiction is an important public health problem worldwide. Cognitive-behavioral therapy is a counseling intervention for supporting cocaine-dependent individuals through recovery and relapse prevention. It may reduce patients' cocaine uses by improving their motivations and enabling them to recognize risky situations. To study the effect of cognitive behavioral therapy on cocaine dependence, the self-reported cocaine use with urine test data were collected at the Primary Care Center of Yale-New Haven Hospital. Its outcomes are binary, including both the daily self-reported drug uses and weekly urine test results. To date, the generalized estimating equations are widely used to analyze binary data with repeated measures. However, due to the existence of significant self-report bias in the self-reported cocaine use with urine test data, a direct application of the generalized estimating equations approach may not be valid. In this paper, we proposed a novel mean corrected generalized estimating equations approach for analyzing longitudinal binary outcomes subject to reporting bias. The mean corrected generalized estimating equations can provide consistently and asymptotically normally distributed estimators under true contamination probabilities. In the self-reported cocaine use with urine test study, accurate weekly urine test results are used to detect contamination. The superior performances of the proposed method are illustrated by both simulation studies and real data analysis.


Assuntos
Cocaína , Projetos de Pesquisa , Viés , Simulação por Computador , Humanos , Autorrelato
17.
Am J Hum Genet ; 82(5): 1193-201, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-18439552

RESUMO

Data from the Pharmacogenomics and Risk of Cardiovascular Disease (PARC) study and the Cardiovascular Health Study (CHS) provide independent and confirmatory evidence for association between common polymorphisms of the HNF1A gene encoding hepatocyte nuclear factor-1 alpha and plasma C-reactive protein (CRP) concentration. Analyses with the use of imputation-based methods to combine genotype data from both studies and to test untyped SNPs from the HapMap database identified several SNPs within a 5 kb region of HNF1A intron 1 with the strongest evidence of association with CRP phenotype.


Assuntos
Proteína C-Reativa/genética , Fator 1-alfa Nuclear de Hepatócito/genética , Idoso , Teorema de Bayes , Feminino , Humanos , Inibidores de Hidroximetilglutaril-CoA Redutases/uso terapêutico , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Pravastatina/uso terapêutico , Sinvastatina/uso terapêutico
18.
Biometrics ; 67(3): 926-36, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21133879

RESUMO

We introduce novel regression extrapolation based methods to correct the often large bias in subsampling variance estimation as well as hypothesis testing for spatial point and marked point processes. For variance estimation, our proposed estimators are linear combinations of the usual subsampling variance estimator based on subblock sizes in a continuous interval. We show that they can achieve better rates in mean squared error than the usual subsampling variance estimator. In particular, for n×n observation windows, the optimal rate of n(-2) can be achieved if the data have a finite dependence range. For hypothesis testing, we apply the proposed regression extrapolation directly to the test statistics based on different subblock sizes, and therefore avoid the need to conduct bias correction for each element in the covariance matrix used to set up the test statistics. We assess the numerical performance of the proposed methods through simulation, and apply them to analyze a tropical forest data set.


Assuntos
Viés , Análise de Regressão , Simulação por Computador , Humanos , Métodos
19.
Biometrics ; 67(3): 730-9, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21361885

RESUMO

A typical recurrent event dataset consists of an often large number of recurrent event processes, each of which contains multiple event times observed from an individual during a follow-up period. Such data have become increasingly available in medical and epidemiological studies. In this article, we introduce novel procedures to conduct second-order analysis for a flexible class of semiparametric recurrent event processes. Such an analysis can provide useful information regarding the dependence structure within each recurrent event process. Specifically, we will use the proposed procedures to test whether the individual recurrent event processes are all Poisson processes and to suggest sensible alternative models for them if they are not. We apply these procedures to a well-known recurrent event dataset on chronic granulomatous disease and an epidemiological dataset on meningococcal disease cases in Merseyside, United Kingdom to illustrate their practical value.


Assuntos
Interpretação Estatística de Dados , Modelos Estatísticos , Recidiva , Biometria/métodos , Estudos Epidemiológicos , Doença Granulomatosa Crônica/patologia , Humanos , Infecções Meningocócicas/epidemiologia , Reino Unido
20.
Biometrics ; 67(3): 711-8, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21361887

RESUMO

This article is concerned with variance estimation for statistics that are computed from single recurrent event processes. Such statistics are important in diagnosis for each individual recurrent event process. The proposed method only assumes a semiparametric form for the first-order structure of the processes but not for the second-order (i.e., dependence) structure. The new variance estimator is shown to be consistent for the target parameter under very mild conditions. The estimator can be used in many applications in semiparametric rate regression analysis of recurrent event data such as outlier detection, residual diagnosis, as well as robust regression. A simulation study and application to two real data examples are used to demonstrate the use of the proposed method.


Assuntos
Biometria/métodos , Modelos Estatísticos , Recidiva , Análise de Variância , Análise de Regressão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA