Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Nat Rev Genet ; 12(6): 443-51, 2011 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-21587300

RESUMEN

Meaningful analysis of next-generation sequencing (NGS) data, which are produced extensively by genetics and genomics studies, relies crucially on the accurate calling of SNPs and genotypes. Recently developed statistical methods both improve and quantify the considerable uncertainty associated with genotype calling, and will especially benefit the growing number of studies using low- to medium-coverage data. We review these methods and provide a guide for their use in NGS studies.


Asunto(s)
Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Alelos , Mapeo Cromosómico , Interpretación Estadística de Datos , Enfermedades Genéticas Congénitas/genética , Genotipo , Humanos , Funciones de Verosimilitud , Desequilibrio de Ligamiento , Probabilidad
2.
Bioinformatics ; 28(15): 2008-15, 2012 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-22641715

RESUMEN

MOTIVATION: A promising class of methods for large-scale population genomic inference use the conditional sampling distribution (CSD), which approximates the probability of sampling an individual with a particular DNA sequence, given that a collection of sequences from the population has already been observed. The CSD has a wide range of applications, including imputing missing sequence data, estimating recombination rates, inferring human colonization history and identifying tracts of distinct ancestry in admixed populations. Most well-used CSDs are based on hidden Markov models (HMMs). Although computationally efficient in principle, methods resulting from the common implementation of the relevant HMM techniques remain intractable for large genomic datasets. RESULTS: To address this issue, a set of algorithmic improvements for performing the exact HMM computation is introduced here, by exploiting the particular structure of the CSD and typical characteristics of genomic data. It is empirically demonstrated that these improvements result in a speedup of several orders of magnitude for large datasets and that the speedup continues to increase with the number of sequences. The optimized algorithms can be adopted in methods for various applications, including the ones mentioned above and make previously impracticable analyses possible. AVAILABILITY: Software available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: yss@eecs.berkeley.edu.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Genética de Población/métodos , Genómica/métodos , Cadenas de Markov , Haplotipos , Humanos , Probabilidad , Programas Informáticos
3.
Theor Popul Biol ; 87: 51-61, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23010245

RESUMEN

Conditional sampling distributions (CSDs), sometimes referred to as copying models, underlie numerous practical tools in population genomic analyses. Though an important application that has received much attention is the inference of population structure, the explicit exchange of migrants at specified rates has not hitherto been incorporated into the CSD in a principled framework. Recently, in the case of a single panmictic population, a sequentially Markov CSD has been developed as an accurate, efficient approximation to a principled CSD derived from the diffusion process dual to the coalescent with recombination. In this paper, the sequentially Markov CSD framework is extended to incorporate subdivided population structure, thus providing an efficiently computable CSD that admits a genealogical interpretation related to the structured coalescent with migration and recombination. As a concrete application, it is demonstrated empirically that the CSD developed here can be employed to yield accurate estimation of a wide range of migration rates.


Asunto(s)
Cadenas de Markov , Recombinación Genética , Probabilidad
4.
Genetics ; 187(4): 1115-28, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21270390

RESUMEN

The sequentially Markov coalescent is a simplified genealogical process that aims to capture the essential features of the full coalescent model with recombination, while being scalable in the number of loci. In this article, the sequentially Markov framework is applied to the conditional sampling distribution (CSD), which is at the core of many statistical tools for population genetic analyses. Briefly, the CSD describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. A hidden Markov model (HMM) formulation of the sequentially Markov CSD is developed here, yielding an algorithm with time complexity linear in both the number of loci and the number of haplotypes. This work provides a highly accurate, practical approximation to a recently introduced CSD derived from the diffusion process associated with the coalescent with recombination. It is empirically demonstrated that the improvement in accuracy of the new CSD over previously proposed HMM-based CSDs increases substantially with the number of loci. The framework presented here can be adopted in a wide range of applications in population genetics, including imputing missing sequence data, estimating recombination rates, and inferring human colonization history.


Asunto(s)
Genética de Población , Cadenas de Markov , Recombinación Genética , Análisis de Secuencia de ADN/métodos , Algoritmos , Simulación por Computador , Sitios Genéticos , Haplotipos , Humanos , Modelos Genéticos , Probabilidad
5.
Genetics ; 186(1): 321-38, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20592264

RESUMEN

The multilocus conditional sampling distribution (CSD) describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. The CSD has a wide range of applications in both computational biology and population genomics analysis, including phasing genotype data into haplotype data, imputing missing data, estimating recombination rates, inferring local ancestry in admixed populations, and importance sampling of coalescent genealogies. Unfortunately, the true CSD under the coalescent with recombination is not known, so approximations, formulated as hidden Markov models, have been proposed in the past. These approximations have led to a number of useful statistical tools, but it is important to recognize that they were not derived from, though were certainly motivated by, principles underlying the coalescent process. The goal of this article is to develop a principled approach to derive improved CSDs directly from the underlying population genetics model. Our approach is based on the diffusion process approximation and the resulting mathematical expressions admit intuitive genealogical interpretations, which we utilize to introduce further approximations and make our method scalable in the number of loci. The general algorithm presented here applies to an arbitrary number of loci and an arbitrary finite-alleles recurrent mutation model. Empirical results are provided to demonstrate that our new CSDs are in general substantially more accurate than previously proposed approximations.


Asunto(s)
ADN/genética , Modelos Genéticos , Recombinación Genética , Algoritmos , Difusión , Femenino , Sitios Genéticos/genética , Humanos , Funciones de Verosimilitud , Masculino , Filogenia , Probabilidad , Tamaño de la Muestra
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA