Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
BMC Bioinformatics ; 15 Suppl 7: S12, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25080132

RESUMEN

BACKGROUND: Since the advent of next-generation sequencing many previously untestable hypotheses have been realized. Next-generation sequencing has been used for a wide range of studies in diverse fields such as population and medical genetics, phylogenetics, microbiology, and others. However, this novel technology has created unanticipated challenges such as the large numbers of genetic variants. Each caucasian genome has more than four million single nucleotide variants, insertions and deletions, copy number variants, and structural variants. Several formats have been suggested for storing these variants; however, the variant call format (VCF) has become the community standard. RESULTS: We developed new software called the Variant Tool Chest (VTC) to provide much needed tools to work with VCF files. VTC provides a variety of tools for manipulating, comparing, and analyzing VCF files beyond the functionality of existing tools. In addition, VTC was written to be easily extended with new tools. CONCLUSIONS: Variant Tool Chest brings new and important functionality that complements and integrates well with existing software. VTC is available at https://github.com/mebbert/VariantToolChest.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Bases de Datos Genéticas , Variación Genética , Genoma Humano , Genotipo , Humanos
2.
Bioinformatics ; 29(11): 1361-6, 2013 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-23620357

RESUMEN

MOTIVATION: Accurate determination of single-nucleotide polymorphisms (SNPs) from next-generation sequencing data is a significant challenge facing bioinformatics researchers. Most current methods use mechanistic models that assume nucleotides aligning to a given reference position are sampled from a binomial distribution. While such methods are sensitive, they are often unable to discriminate errors resulting from misaligned reads, sequencing errors or platform artifacts from true variants. RESULTS: To enable more accurate SNP calling, we developed an algorithm that uses a trained support vector machine (SVM) to determine variants from .BAM or .SAM formatted alignments of sequence reads. Our SVM-based implementation determines SNPs with significantly greater sensitivity and specificity than alternative platforms, including the UnifiedGenotyper included with the Genome Analysis Toolkit, samtools and FreeBayes. In addition, the quality scores produced by our implementation more accurately reflect the likelihood that a variant is real when compared with those produced by the Genome Analysis Toolkit. While results depend on the model used, the implementation includes tools to easily build new models and refine existing models with additional training data. AVAILABILITY: Source code and executables are available from github.com/brendanofallon/SNPSVM/


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodos , Máquina de Vectores de Soporte , Genómica , Alineación de Secuencia
3.
Proc Natl Acad Sci U S A ; 108(51): 20444-8, 2011 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-22143784

RESUMEN

The genetic and demographic impact of European contact with Native Americans has remained unclear despite recent interest. Whereas archeological and historical records indicate that European contact resulted in widespread mortality from various sources, genetic studies have found little evidence of a recent contraction in Native American population size. In this study we use a large dataset including both ancient and contemporary mitochondrial DNA to construct a high-resolution portrait of the Holocene and late Pleistocene population size of indigenous Americans. Our reconstruction suggests that Native Americans suffered a significant, although transient, contraction in population size some 500 y before the present, during which female effective size was reduced by ∼50%. These results support analyses of historical records indicating that European colonization induced widespread mortality among indigenous Americans.


Asunto(s)
Genética de Población , Indígenas Norteamericanos/genética , Población Blanca/genética , Teorema de Bayes , ADN Mitocondrial/genética , Etnicidad/genética , Europa (Continente) , Femenino , Variación Genética , Humanos , Modelos Genéticos , Datos de Secuencia Molecular , Filogenia , Dinámica Poblacional , Programas Informáticos
4.
BMC Bioinformatics ; 14: 40, 2013 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-23379678

RESUMEN

BACKGROUND: Reconstruction of population history from genetic data often requires Monte Carlo integration over the genealogy of the samples. Among tools that perform such computations, few are able to consider genetic histories including recombination events, precluding their use on most alignments of nuclear DNA. Explicit consideration of recombinations requires modeling the history of the sequences with an Ancestral Recombination Graph (ARG) in place of a simple tree, which presents significant computational challenges. RESULTS: ACG is an extensible desktop application that uses a Bayesian Markov chain Monte Carlo procedure to estimate the posterior likelihood of an evolutionary model conditional on an alignment of genetic data. The ancestry of the sequences is represented by an ARG, which is estimated from the data with other model parameters. Importantly, ACG computes the full, Felsenstein likelihood of the ARG, not a pairwise or composite likelihood. Several strategies are used to speed computations, and ACG is roughly 100x faster than a similar, recombination-aware program. CONCLUSIONS: Modeling the ancestry of the sequences with an ARG allows ACG to estimate the evolutionary history of recombining nucleotide sequences. ACG can accurately estimate the posterior distribution of population parameters such as the (scaled) population size and recombination rate, as well as many aspects of the recombinant history, including the positions of recombination breakpoints, the distribution of time to most recent common ancestor along the sequence, and the non-recombining trees at individual sites. Multiple substitution models and population size models are provided. ACG also provides a richly informative graphical interface that allows users to view the evolution of model parameters and likelihoods in real time.


Asunto(s)
Linaje , Recombinación Genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Teorema de Bayes , Evolución Molecular , Cadenas de Markov , Modelos Genéticos , Método de Montecarlo , Densidad de Población , Alineación de Secuencia
5.
BMC Bioinformatics ; 14 Suppl 13: S1, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24268183

RESUMEN

BACKGROUND: Identification of the genetic alterations responsible for human disease is a central challenge facing medical genetics. While many algorithms have been developed to predict the degree of damage caused by a given sequence alteration, few tools are able to incorporate information about a given phenotype of interest. METHODS: Here, we describe an algorithm and web-based application which take into account both the probability that a variant damages the function of a gene as well as the relevance of the gene to a given phenotype. Phenotypes are described by a list of scored terms supplied by the user. These terms are then used to search a variety of public databases including NCBI gene summaries, PubMed abstracts, and Gene Ontology terms, and protein-protein interactions in String-DB to determine a relevance score. The overall ranking is determined by the product of the functional damage score and the relevance score, such that highly ranked variants are likely to be damaging and in genes of interest. RESULTS: We demonstrate the method on several test cases including samples with Hereditary Hemorrhagic Telangiectasia (HHT) and Diamond-Blackfan Anemia (DBA). We have also implemented a web-based application which allows public access to the VarRanker algorithm. CONCLUSIONS: Automated searching of public literature and online databases may substantially decrease the amount of time required to identify the mutations underlying human disease. However, several ad-hoc and subjective decisions must be made, and the results of such analyses are likely to depend on the researcher and the state of the literature and databases involved.


Asunto(s)
Algoritmos , Variación Estructural del Genoma , Mutación/genética , Fenotipo , Análisis de Secuencia de Proteína/clasificación , Anemia de Diamond-Blackfan/genética , Biología Computacional , Humanos , Almacenamiento y Recuperación de la Información/métodos , Modelos Lineales , Telangiectasia Hemorrágica Hereditaria/genética , Vocabulario Controlado
6.
Mol Biol Evol ; 28(11): 3171-81, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21680870

RESUMEN

The serial coalescent extends traditional coalescent theory to include genealogies in which not all individuals were sampled at the same time. Inference in this framework is powerful because population size and evolutionary rate may be estimated independently. However, when the sequences in question are affected by selection acting at many sites, the genealogies may differ significantly from their neutral expectation, and inference of demographic parameters may become inaccurate. I demonstrate that this inaccuracy is severe when the mutation rate and strength of selection are jointly large, and I develop a new likelihood calculation that, while approximate, improves the accuracy of population size estimates. When used in a Bayesian parameter estimation context, the new calculation allows for estimation of the shape of the pairwise coalescent rate function and can be used to detect the presence of selection acting at many sites in a sequence. Using the new method, I investigate two sets of dengue virus sequences from Puerto Rico and Thailand, and show that both genealogies are likely to have been distorted by selection.


Asunto(s)
Evolución Molecular , Genética de Población/métodos , Modelos Genéticos , Densidad de Población , Selección Genética , Teorema de Bayes , Virus del Dengue/genética , Funciones de Verosimilitud , Mutación/genética , Puerto Rico , Tailandia
7.
Mol Biol Evol ; 27(10): 2406-16, 2010 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-20513741

RESUMEN

Accurate reconstruction of the divergence times among individuals is an essential step toward inferring population parameters from genetic data. However, our ability to reconstruct accurate genealogies is often thwarted by the evolutionary forces we hope to detect, most prominently natural selection. Here, I demonstrate that purifying selection acting at many linked sites can systematically bias current methods of genealogical reconstruction, and I present a new method that corrects for this bias by allowing a class of sites to have a time-dependent rate. The parameters influencing the time dependency can be estimated from the data, allowing for a general method to detect the presence of selected sites and correcting for their distortion of the apparent mutation rate. The method works well under a variety of scenarios, including gamma-distributed selection coefficients as well as entirely neutral evolution. I also compare the performance of the new method to relaxed clock models, and I demonstrate the method on a data set from the mitochondrion of the North Atlantic whale-"louse" Cyamus ovalis.


Asunto(s)
Evolución Molecular , Genética de Población , Modelos Genéticos , Linaje , Selección Genética , Anfípodos/genética , Animales , Simulación por Computador , ADN Mitocondrial/genética , Funciones de Verosimilitud , Mutación/genética
8.
Mol Biol Evol ; 27(5): 1162-72, 2010 May.
Artículo en Inglés | MEDLINE | ID: mdl-20097659

RESUMEN

Coalescent theory provides an elegant and powerful method for understanding the shape of gene genealogies and resulting patterns of genetic diversity. However, the coalescent does not naturally accommodate the effects of heritable variation in fitness. Although some methods are available for studying the effects of strong selection (Ns >> 1), few tools beyond forward simulation are available for quantifying the impact of weak selection at many sites. Here, we introduce a continuous-state coalescent capable of accurately describing the distortions to genealogies caused by moderate to weak natural selection affecting many linked sites. We calculate approximately the full distribution of pairwise coalescent times, the lengths of coalescent intervals, and the time to the most recent common ancestor of a sample. Weak selection (Ns approximately 1) is found to substantially decrease the tree depth, primarily through a shortening of the lengths of the basal coalescent intervals. Additionally, we demonstrate that only two parameters, population size and the variance of the distribution describing fitness heritability, are sufficient to describe most changes.


Asunto(s)
Biología Computacional/métodos , Genes/genética , Filogenia , Selección Genética , Simulación por Computador , Patrón de Herencia/genética , Modelos Genéticos , Factores de Tiempo
9.
Proc Biol Sci ; 274(1629): 3159-64, 2007 Dec 22.
Artículo en Inglés | MEDLINE | ID: mdl-17939983

RESUMEN

Most models of quasi-species evolution predict that populations will evolve to occupy areas of sequence space with the greatest concentration of neutral sequences, thus minimizing the deleterious mutation rate and creating mutationally 'robust' genomes. In contrast, empirical studies of the principal model of quasi-species evolution, RNA viruses, suggest that the effects of deleterious mutations are more severe than in similar DNA-based microbes. We demonstrate that populations divided into discrete patches connected by dispersal may favour genotypes where the deleterious effect of non-neutral mutations is maximized. This effect is especially strong in the absence of back mutation and when the amount of time spent in hosts prior to dispersal is intermediate. Our results indicate that RNA viruses that produce acute infections initiated by a small number of virions are expected to evolve fragile genetic architectures when compared with other RNA viruses.


Asunto(s)
Evolución Biológica , Modelos Biológicos , Mutación , Virus ARN/genética , Selección Genética , Simulación por Computador , Genotipo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA