Pesquisa | Portal de Pesquisa da BVS

1.

A genome-wide approach for detecting novel insertion-deletion variants of mid-range size.

Xia, Li C; Sakshuwong, Sukolsak; Hopmans, Erik S; Bell, John M; Grimes, Susan M; Siegmund, David O; Ji, Hanlee P; Zhang, Nancy R.

Nucleic Acids Res ; 44(15): e126, 2016 09 06.

Artigo em Inglês | MEDLINE | ID: mdl-27325742

RESUMO

We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (<10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.

Assuntos

Análise Mutacional de DNA/métodos , Genoma/genética , Genômica/métodos , Mutação INDEL/genética , Adenoviridae/genética , Algoritmos , Animais , Benchmarking , Simulação por Computador , Conjuntos de Dados como Assunto , Pan troglodytes/virologia , Distribuição de Poisson , Reprodutibilidade dos Testes

2.

Joint testing of genotype and ancestry association in admixed families.

Tang, Hua; Siegmund, David O; Johnson, Nicholas A; Romieu, Isabelle; London, Stephanie J.

Genet Epidemiol ; 34(8): 783-91, 2010 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-21031451

RESUMO

Current genome-wide association studies (GWAS) often involve populations that have experienced recent genetic admixture. Genotype data generated from these studies can be used to test for association directly, as in a non-admixed population. As an alternative, these data can be used to infer chromosomal ancestry, and thus allow for admixture mapping. We quantify the contribution of allele-based and ancestry-based association testing under a family-design, and demonstrate that the two tests can provide non-redundant information. We propose a joint testing procedure, which efficiently integrates the two sources information. The efficiencies of the allele, ancestry and combined tests are compared in the context of a GWAS. We discuss the impact of population history and provide guidelines for future design and analysis of GWAS in admixed populations.

Assuntos

Alelos , Cromossomos Humanos , Genética Populacional/estatística & dados numéricos , Estudo de Associação Genômica Ampla/métodos , Americanos Mexicanos/genética , Indígena Americano ou Nativo do Alasca/genética , Asma , População Negra/genética , Criança , Mapeamento Cromossômico/métodos , Intervalos de Confiança , Genoma Humano , Genótipo , Humanos , Pais , População Branca/genética

3.

Ghost QTL and hotspots in experimental crosses: novel approach for modeling polygenic effects.

Wallin, Jonas; Bogdan, Malgorzata; Szulc, Piotr A; Doerge, R W; Siegmund, David O.

Genetics ; 217(3)2021 03 31.

Artigo em Inglês | MEDLINE | ID: mdl-33789342

RESUMO

Ghost quantitative trait loci (QTL) are the false discoveries in QTL mapping, that arise due to the "accumulation" of the polygenic effects, uniformly distributed over the genome. The locations on the chromosome that are strongly correlated with the total of the polygenic effects depend on a specific sample correlation structure determined by the genotypes at all loci. The problem is particularly severe when the same genotypes are used to study multiple QTL, e.g. using recombinant inbred lines or studying the expression QTL. In this case, the ghost QTL phenomenon can lead to false hotspots, where multiple QTL show apparent linkage to the same locus. We illustrate the problem using the classic backcross design and suggest that it can be solved by the application of the extended mixed effect model, where the random effects are allowed to have a nonzero mean. We provide formulas for estimating the thresholds for the corresponding t-test statistics and use them in the stepwise selection strategy, which allows for a simultaneous detection of several QTL. Extensive simulation studies illustrate that our approach eliminates ghost QTL/false hotspots, while preserving a high power of true QTL detection.

Assuntos

Cruzamentos Genéticos , Modelos Genéticos , Herança Multifatorial , Locos de Características Quantitativas , Animais , Cruzamento/métodos , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Plantas/genética

4.

Mapping quantitative traits in unselected families: algorithms and examples.

Dupuis, Josée; Shi, Jianxin; Manning, Alisa K; Benjamin, Emelia J; Meigs, James B; Cupples, L Adrienne; Siegmund, David.

Genet Epidemiol ; 33(7): 617-27, 2009 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-19278016

RESUMO

Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic, which in contrast to the likelihood ratio statistic can use nonparametric estimators of variability to achieve robustness of the false-positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity by descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study.

Assuntos

Mapeamento Cromossômico/métodos , Locos de Características Quantitativas , Algoritmos , Meio Ambiente , Reações Falso-Positivas , Saúde da Família , Feminino , Humanos , Masculino , Modelos Genéticos , Modelos Estatísticos , Epidemiologia Molecular/métodos , Análise Multivariada , Linhagem , Fenótipo , Reprodutibilidade dos Testes

5.

A unified framework for linkage and association analysis of quantitative traits.

Dupuis, Josée; Siegmund, David O; Yakir, Benjamin.

Proc Natl Acad Sci U S A ; 104(51): 20210-5, 2007 Dec 18.

Artigo em Inglês | MEDLINE | ID: mdl-18077372

RESUMO

We give a unified treatment of the statistical foundations of population based association mapping and of family based linkage mapping of quantitative traits in humans. A central ingredient in the unification involves the efficient score statistic. The discussion focuses on generalized linear models with an additional illustration of the Cox (proportional hazards) model for age of onset data. We give analytic expressions for noncentrality parameters and show how they give qualitative insight into the loss of power that occurs if the scientist's assumed genetic model differs from nature's "true" genetic model. Issues to be studied in detail in the future development of this approach are discussed.

Assuntos

Mapeamento Cromossômico , Interpretação Estatística de Dados , Modelos Genéticos , Locos de Características Quantitativas , Humanos , Linhagem

6.

Approximating the variance of the conditional probability of the state of a hidden Markov model.

Siegmund, David O; Yakir, Benjamin.

Stat Appl Genet Mol Biol ; 6: Article 18, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-17672820

RESUMO

In a hidden Markov model, one "estimates" the state of the hidden Markov chain at t by computing via the forwards-backwards algorithm the conditional distribution of the state vector given the observed data. The covariance matrix of this conditional distribution measures the information lost by failure to observe directly the state of the hidden process. In the case where changes of state occur slowly relative to the speed at which information about the underlying state accumulates in the observed data, we compute approximately these covariances in terms of functionals of Brownian motion that arise in change-point analysis. Applications in gene mapping, where these covariances play a role in standardizing the score statistic and in evaluating the loss of noncentrality due to incomplete information, are discussed. Numerical examples illustrate the range of validity and limitations of our results.

Assuntos

Cadeias de Markov , Modelos Genéticos , Teoria da Probabilidade

7.

Spatial regulation and the rate of signal transduction activation.

Batada, Nizar N; Shepp, Larry A; Siegmund, David O; Levitt, Michael.

PLoS Comput Biol ; 2(5): e44, 2006 May.

Artigo em Inglês | MEDLINE | ID: mdl-16699596

RESUMO

Of the many important signaling events that take place on the surface of a mammalian cell, activation of signal transduction pathways via interactions of cell surface receptors is one of the most important. Evidence suggests that cell surface proteins are not as freely diffusible as implied by the classic fluid mosaic model and that their confinement to membrane domains is regulated. It is unknown whether these dynamic localization mechanisms function to enhance signal transduction activation rate or to minimize cross talk among pathways that share common intermediates. To determine which of these two possibilities is more likely, we derive an explicit equation for the rate at which cell surface membrane proteins interact based on a Brownian motion model in the presence of endocytosis and exocytosis. We find that in the absence of any diffusion constraints, cell surface protein interaction rate is extremely high relative to cytoplasmic protein interaction rate even in a large mammalian cell with a receptor abundance of a mere two hundred molecules. Since a larger number of downstream signaling events needs to take place, each occurring at a much slower rate than the initial activation via association of cell surface proteins, we conclude that the role of co-localization is most likely that of cross-talk reduction rather than coupling efficiency enhancement.

Assuntos

Modelos Biológicos , Transdução de Sinais , Animais , Citoplasma/metabolismo , Dimerização , Proteínas/química , Proteínas/metabolismo , Fatores de Tempo

8.

Frequentist estimation of coalescence times from nucleotide sequence data using a tree-based partition.

Tang, Hua; Siegmund, David O; Shen, Peidong; Oefner, Peter J; Feldman, Marcus W.

Genetics ; 161(1): 447-59, 2002 May.

Artigo em Inglês | MEDLINE | ID: mdl-12019257

RESUMO

This article proposes a method of estimating the time to the most recent common ancestor (TMRCA) of a sample of DNA sequences. The method is based on the molecular clock hypothesis, but avoids assumptions about population structure. Simulations show that in a wide range of situations, the point estimate has small bias and the confidence interval has at least the nominal coverage probability. We discuss conditions that can lead to biased estimates. Performance of this estimator is compared with existing methods based on the coalescence theory. The method is applied to sequences of Y chromosomes and mtDNAs to estimate the coalescent times of human male and female populations.

Assuntos

DNA , Evolução Molecular , Filogenia , Algoritmos , Sequência de Bases , Simulação por Computador , DNA Mitocondrial , Modelos Genéticos , Dados de Sequência Molecular , Mutação , Viés de Seleção , Estatística como Assunto , Cromossomo Y

9.

Detecting simultaneous changepoints in multiple sequences.

Zhang, Nancy R; Siegmund, David O; Ji, Hanlee; Li, Jun Z.

Biometrika ; 97(3): 631-645, 2010 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-22822250

RESUMO

We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.

10.

A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data.

Zhang, Nancy R; Siegmund, David O.

Biometrics ; 63(1): 22-32, 2007 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-17447926

RESUMO

In the analysis of data generated by change-point processes, one critical challenge is to determine the number of change-points. The classic Bayes information criterion (BIC) statistic does not work well here because of irregularities in the likelihood function. By asymptotic approximation of the Bayes factor, we derive a modified BIC for the model of Brownian motion with changing drift. The modified BIC is similar to the classic BIC in the sense that the first term consists of the log likelihood, but it differs in the terms that penalize for model dimension. As an example of application, this new statistic is used to analyze array-based comparative genomic hybridization (array-CGH) data. Array-CGH measures the number of chromosome copies at each genome location of a cell sample, and is useful for finding the regions of genome deletion and amplification in tumor cells. The modified BIC performs well compared to existing methods in accurately choosing the number of regions of changed copy number. Unlike existing methods, it does not rely on tuning parameters or intensive computing. Thus it is impartial and easier to understand and to use.

Assuntos

Teorema de Bayes , Genoma , Modelos Genéticos , Hibridização de Ácido Nucleico , Biometria , Linhagem Celular , Simulação por Computador , Humanos , Método de Monte Carlo , Análise de Sequência com Séries de Oligonucleotídeos/métodos

11.

Statistical corrections of linkage data suggest predominantly cis regulations of gene expression.

Shi, Jianxin; Siegmund, David O; Levinson, Douglas F.

BMC Proc ; 1 Suppl 1: S145, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-18466489

RESUMO

Morley et al. (Nature 2004, 430:743-747) detected significant linkages to the expression levels of 142 genes (of 3554) at a reported threshold of genome-wide p = 0.001 (LOD asymptotically equal to 5.3), using 14 three-generation Centre d'Etude du Polymorphisme Humain pedigrees. Most of the linkages (77%) were trans, i.e., more than 5 Mb from the expressed gene. However, the analysis did not account for the expected anti-conservative effect of the skewed distribution of score- or regression-based statistics in large sibships, or for the possible variance distortion due to correlations among tests. Therefore, we re-analyzed their data, using a robust score statistic for the entire pedigrees and correcting the p-values for skewness. We found that a LOD of 5.3 had a skewness-corrected genome-wide p-value of 0.016 instead of 0.001 (a result that we confirmed using simulation), with around 50 expected false positives. We then further corrected for correlation among the (skew-corrected) p-values by using Efron's method for obtaining the empirical null distribution. Setting a threshold of FDR = 10% (Z = 6.4, LOD = 8.9), we detected linkage for the expression levels of 22 genes, 19 of which are cis. Limiting the analysis to cis regions, linkage was detected to the expression levels of 46 genes with 4.6 expected false positives (FDR = 10%).

12.

Genome scans with gene-covariate interaction.

Peng, Jie; Tang, Hsiu-Khuern; Siegmund, David.

Genet Epidemiol ; 29(3): 173-84, 2005 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-16216012

RESUMO

Genetic models for gene-covariate interactions are described. Methods of linkage analysis that utilize special features of these models and the corresponding score statistics are derived. Their power is compared with that of simple genome scans that ignore these special features, and substantial gains in power are observed when the gene-covariate interaction is strong. Quantitative trait mapping in randomly ascertained sibships and affected sibpair mapping are discussed. For the latter case, a simpler statistic is proposed that has similar performance to the score statistic, but does not require the estimation of nuisance parameters. Since the nuisance parameters are not estimable solely from affected sib-pair data, this statistic would be much easier to apply in practice. Similarities with linkage analysis of models for longitudinal data and multivariate phenotypes are also briefly discussed. Approximations for the P-value and power are derived under the framework of local alternatives.

Assuntos

Ligação Genética , Genoma Humano , Modelos Genéticos , Locos de Características Quantitativas , Mapeamento Cromossômico , Humanos , Estatística como Assunto

13.

On the power for linkage detection using a test based on scan statistics.

Hernández, Sonia; Siegmund, David O; de Gunst, Mathisca.

Biostatistics ; 6(2): 259-69, 2005 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-15772104

RESUMO

We analyze some aspects of scan statistics, which have been proposed to help for the detection of weak signals in genetic linkage analysis. We derive approximate expressions for the power of a test based on moving averages of the identity by descent allele sharing proportions for pairs of relatives at several contiguous markers. We confirm these approximate formulae by simulation. The results show that when there is a single trait-locus on a chromosome, the test based on the scan statistic is slightly less powerful than that based on the customary allele sharing statistic. On the other hand, if two genes having a moderate effect on a trait lie close to each other on the same chromosome, scan statistics improve power to detect linkage.

Assuntos

Mapeamento Cromossômico/métodos , Interpretação Estatística de Dados , Ligação Genética , Modelos Genéticos , Simulação por Computador , Humanos

14.

Mapping multiple genes for quantitative or complex traits.

Tang, Hsiu-Khuern; Siegmund, David.

Genet Epidemiol ; 22(4): 313-27, 2002 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-11984864

RESUMO

Models for complex and quantitative traits that involve multiple, possibly interacting, genes are described. Methods of linkage analysis are developed that utilize special features of these models, and their power is compared with that of simple genome scans that ignore these special features. Our calculations show that for family-based nonparametric linkage analysis in human genetics, in contrast to experimental genetics, there are limits to the increase in power that can be achieved by correctly modeling gene-gene interactions. In particular, the noncentrality parameter of likelihood-based statistics to detect single gene effects involves both single gene and interaction components of variance, so even when the interaction components of variance are relatively large, the incremental power from a statistic designed to detect both single gene and interaction effects is often quite modest. We carry out our analysis with the assistance of a parameterization that allows us to compute score statistics, noncentrality parameters, and Fisher information matrices reasonably explicitly.

Assuntos

Mapeamento Cromossômico , Ligação Genética/genética , Característica Quantitativa Herdável , Alelos , Animais , Genótipo , Humanos , Funções Verossimilhança , Modelos Genéticos , Fenótipo

15.

Stochastic model of protein-protein interaction: why signaling proteins need to be colocalized.

Batada, Nizar N; Shepp, Larry A; Siegmund, David O.

Proc Natl Acad Sci U S A ; 101(17): 6445-9, 2004 Apr 27.

Artigo em Inglês | MEDLINE | ID: mdl-15096590

RESUMO

Colocalization of proteins that are part of the same signal transduction pathway via compartmentalization, scaffold, or anchor proteins is an essential aspect of the signal transduction system in eukaryotic cells. If interaction must occur via free diffusion, then the spatial separation between the sources of the two interacting proteins and their degradation rates become primary determinants of the time required for interaction. To understand the role of such colocalization, we create a mathematical model of the diffusion based protein-protein interaction process. We assume that mRNAs, which serve as the sources of these proteins, are located at different positions in the cytoplasm. For large cells such as Drosophila oocytes we show that if the source mRNAs were at random locations in the cell rather than colocalized, the average rate of interactions would be extremely small, which suggests that localization is needed to facilitate protein interactions and not just to prevent cross-talk between different signaling modules.

Assuntos

Modelos Moleculares , Proteínas/metabolismo , Proteínas/química

16.

Gene expression patterns and gene copy number changes in dermatofibrosarcoma protuberans.

Linn, Sabine C; West, Rob B; Pollack, Jonathan R; Zhu, Shirley; Hernandez-Boussard, Tina; Nielsen, Torsten O; Rubin, Brian P; Patel, Rajiv; Goldblum, John R; Siegmund, David; Botstein, David; Brown, Patrick O; Gilks, C Blake; van de Rijn, Matt.

Am J Pathol ; 163(6): 2383-95, 2003 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-14633610

RESUMO

Dermatofibrosarcoma protuberans (DFSP) is an aggressive spindle cell neoplasm. It is associated with the chromosomal translocation, t(17:22), which fuses the COL1A1 and PDGFbeta genes. We determined the characteristic gene expression profile of DFSP and characterized DNA copy number changes in DFSP by array-based comparative genomic hybridization (array CGH). Fresh frozen and formalin-fixed, paraffin-embedded samples of DFSP were analyzed by array CGH (four cases) and DNA microarray analysis of global gene expression (nine cases). The nine DFSPs were readily distinguished from 27 other diverse soft tissue tumors based on their gene expression patterns. Genes characteristically expressed in the DFSPs included PDGF beta and its receptor, PDGFRB, APOD, MEOX1, PLA2R, and PRKCA. Array CGH of DNA extracted either from frozen tumor samples or from paraffin blocks yielded equivalent results. Large areas of chromosomes 17q and 22q, bounded by COL1A1 and PDGF beta, respectively, were amplified in DFSP. Expression of genes in the amplified regions was significantly elevated. Our data shows that: 1) DFSP has a distinctive gene expression profile; 2) array CGH can be applied successfully to frozen or formalin-fixed, paraffin-embedded tumor samples; 3) a characteristic amplification of sequences from chromosomes 17q and 22q, demarcated by the COL1A1 and PDGF beta genes, respectively, was associated with elevated expression of the amplified genes.

Assuntos

Dermatofibrossarcoma/genética , Dosagem de Genes , Perfilação da Expressão Gênica , Neoplasias Cutâneas/genética , Adulto , Idoso , Criopreservação , Dermatofibrossarcoma/metabolismo , Dermatofibrossarcoma/patologia , Feminino , Fixadores , Formaldeído , Expressão Gênica , Humanos , Imuno-Histoquímica , Masculino , Pessoa de Meia-Idade , Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , Inclusão em Parafina , Neoplasias Cutâneas/metabolismo , Neoplasias Cutâneas/patologia , Fixação de Tecidos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA