Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Bioinformatics ; 2024 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-39298478

RESUMO

MOTIVATION: Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov Models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false positive detections in a typical human whole genome, creating a significant manual review burden. RESULTS: We introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern Large Language Models (LLMs). We train our model on 37 Whole Genome Sequences (WGS) from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3 and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested. AVAILABILITY AND IMPLEMENTATION: Jenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/.

2.
BMC Bioinformatics ; 23(1): 285, 2022 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-35854218

RESUMO

BACKGROUND: Copy number variants (CNVs) play a significant role in human heredity and disease. However, sensitive and specific characterization of germline CNVs from NGS data has remained challenging, particularly for hybridization-capture data in which read counts are the primary source of copy number information. RESULTS: We describe two algorithmic adaptations that improve CNV detection accuracy in a Hidden Markov Model (HMM) context. First, we present a method for computing target- and copy number-specific emission distributions. Second, we demonstrate that the Pointwise Maximum a posteriori (PMAP) HMM decoding procedure yields improved sensitivity for small CNV calls compared to the more common Viterbi HMM decoder. We develop a prototype implementation, called Cobalt, and compare it to other CNV detection tools using sets of simulated and previously detected CNVs with sizes spanning a single exon to a full chromosome. CONCLUSIONS: In both the simulation and previously detected CNV studies Cobalt shows similar sensitivity but significantly fewer false positive detections compared to other callers. Overall sensitivity is 80-90% for deletion CNVs spanning 1-4 targets and 90-100% for larger deletion events, while sensitivity is somewhat lower for small duplication CNVs.


Assuntos
Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Simulação por Computador , Éxons , Células Germinativas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
3.
Genet Med ; 21(9): 2007-2014, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-30760892

RESUMO

PURPOSE: EPHB4 variants were recently reported to cause capillary malformation-arteriovenous malformation 2 (CM-AVM2). CM-AVM2 mimics RASA1-related CM-AVM1 and hereditary hemorrhagic telangiectasia (HHT), as clinical features include capillary malformations (CMs), telangiectasia, and arteriovenous malformations (AVMs). Epistaxis, another clinical feature that overlaps with HHT, was reported in several cases. Based on the clinical overlap of CM-AVM2 and HHT, we hypothesized that patients considered clinically suspicious for HHT with no variant detected in an HHT gene (ENG, ACVRL1, or SMAD4) may have an EPHB4 variant. METHODS: Exome sequencing or a next-generation sequencing panel including EPHB4 was performed on individuals with previously negative molecular genetic testing for the HHT genes and/or RASA1. RESULTS: An EPHB4 variant was identified in ten unrelated cases. Seven cases had a pathogenic EPHB4 variant, including one with mosaicism. Three cases had an EPHB4 variant of uncertain significance. The majority had epistaxis (6/10 cases) and telangiectasia (8/10 cases), as well as CMs. Two of ten cases had a central nervous system AVM. CONCLUSIONS: Our results emphasize the importance of considering CM-AVM2 as part of the clinical differential for HHT and other vascular malformation syndromes. Yet, these cases highlight significant differences in the cutaneous presentations of CM-AVM2 versus HHT.


Assuntos
Capilares/anormalidades , Testes Genéticos , Receptor EphB4/genética , Telangiectasia Hemorrágica Hereditária/genética , Malformações Vasculares/genética , Receptores de Activinas Tipo II/genética , Adolescente , Capilares/patologia , Criança , Endoglina/genética , Feminino , Humanos , Masculino , Mutação , Proteína Smad4/genética , Telangiectasia Hemorrágica Hereditária/diagnóstico , Telangiectasia Hemorrágica Hereditária/patologia , Malformações Vasculares/patologia , Sequenciamento do Exoma
4.
Am J Hum Genet ; 93(3): 530-7, 2013 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-23972370

RESUMO

Hereditary hemorrhagic telangiectasia (HHT), the most common inherited vascular disorder, is caused by mutations in genes involved in the transforming growth factor beta (TGF-ß) signaling pathway (ENG, ACVRL1, and SMAD4). Yet, approximately 15% of individuals with clinical features of HHT do not have mutations in these genes, suggesting that there are undiscovered mutations in other genes for HHT and possibly vascular disorders with overlapping phenotypes. The genetic etiology for 191 unrelated individuals clinically suspected to have HHT was investigated with the use of exome and Sanger sequencing; these individuals had no mutations in ENG, ACVRL1, and SMAD4. Mutations in BMP9 (also known as GDF2) were identified in three unrelated probands. These three individuals had epistaxis and dermal lesions that were described as telangiectases but whose location and appearance resembled lesions described in some individuals with RASA1-related disorders (capillary malformation-arteriovenous malformation syndrome). Analyses of the variant proteins suggested that mutations negatively affect protein processing and/or function, and a bmp9-deficient zebrafish model demonstrated that BMP9 is involved in angiogenesis. These data confirm a genetic cause of a vascular-anomaly syndrome that has phenotypic overlap with HHT.


Assuntos
Vasos Sanguíneos/anormalidades , Fatores de Diferenciação de Crescimento/genética , Mutação/genética , Telangiectasia Hemorrágica Hereditária/genética , Telangiectasia Hemorrágica Hereditária/patologia , Adolescente , Adulto , Substituição de Aminoácidos/genética , Animais , Feminino , Predisposição Genética para Doença , Fator 2 de Diferenciação de Crescimento , Humanos , Ligantes , Masculino , Camundongos , Mutação de Sentido Incorreto/genética , Fenótipo , Ligação Proteica , Processamento de Proteína Pós-Traducional , Transdução de Sinais/genética , Síndrome , Fator de Crescimento Transformador beta/genética , Peixe-Zebra/genética
5.
BMC Bioinformatics ; 15 Suppl 7: S12, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25080132

RESUMO

BACKGROUND: Since the advent of next-generation sequencing many previously untestable hypotheses have been realized. Next-generation sequencing has been used for a wide range of studies in diverse fields such as population and medical genetics, phylogenetics, microbiology, and others. However, this novel technology has created unanticipated challenges such as the large numbers of genetic variants. Each caucasian genome has more than four million single nucleotide variants, insertions and deletions, copy number variants, and structural variants. Several formats have been suggested for storing these variants; however, the variant call format (VCF) has become the community standard. RESULTS: We developed new software called the Variant Tool Chest (VTC) to provide much needed tools to work with VCF files. VTC provides a variety of tools for manipulating, comparing, and analyzing VCF files beyond the functionality of existing tools. In addition, VTC was written to be easily extended with new tools. CONCLUSIONS: Variant Tool Chest brings new and important functionality that complements and integrates well with existing software. VTC is available at https://github.com/mebbert/VariantToolChest.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Bases de Dados Genéticas , Variação Genética , Genoma Humano , Genótipo , Humanos
6.
Bioinformatics ; 29(11): 1361-6, 2013 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-23620357

RESUMO

MOTIVATION: Accurate determination of single-nucleotide polymorphisms (SNPs) from next-generation sequencing data is a significant challenge facing bioinformatics researchers. Most current methods use mechanistic models that assume nucleotides aligning to a given reference position are sampled from a binomial distribution. While such methods are sensitive, they are often unable to discriminate errors resulting from misaligned reads, sequencing errors or platform artifacts from true variants. RESULTS: To enable more accurate SNP calling, we developed an algorithm that uses a trained support vector machine (SVM) to determine variants from .BAM or .SAM formatted alignments of sequence reads. Our SVM-based implementation determines SNPs with significantly greater sensitivity and specificity than alternative platforms, including the UnifiedGenotyper included with the Genome Analysis Toolkit, samtools and FreeBayes. In addition, the quality scores produced by our implementation more accurately reflect the likelihood that a variant is real when compared with those produced by the Genome Analysis Toolkit. While results depend on the model used, the implementation includes tools to easily build new models and refine existing models with additional training data. AVAILABILITY: Source code and executables are available from github.com/brendanofallon/SNPSVM/


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Máquina de Vetores de Suporte , Genômica , Alinhamento de Sequência
7.
Proc Natl Acad Sci U S A ; 108(51): 20444-8, 2011 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-22143784

RESUMO

The genetic and demographic impact of European contact with Native Americans has remained unclear despite recent interest. Whereas archeological and historical records indicate that European contact resulted in widespread mortality from various sources, genetic studies have found little evidence of a recent contraction in Native American population size. In this study we use a large dataset including both ancient and contemporary mitochondrial DNA to construct a high-resolution portrait of the Holocene and late Pleistocene population size of indigenous Americans. Our reconstruction suggests that Native Americans suffered a significant, although transient, contraction in population size some 500 y before the present, during which female effective size was reduced by ∼50%. These results support analyses of historical records indicating that European colonization induced widespread mortality among indigenous Americans.


Assuntos
Genética Populacional , Indígenas Norte-Americanos/genética , População Branca/genética , Teorema de Bayes , DNA Mitocondrial/genética , Etnicidade/genética , Europa (Continente) , Feminino , Variação Genética , Humanos , Modelos Genéticos , Dados de Sequência Molecular , Filogenia , Dinâmica Populacional , Software
8.
BMC Bioinformatics ; 14: 40, 2013 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-23379678

RESUMO

BACKGROUND: Reconstruction of population history from genetic data often requires Monte Carlo integration over the genealogy of the samples. Among tools that perform such computations, few are able to consider genetic histories including recombination events, precluding their use on most alignments of nuclear DNA. Explicit consideration of recombinations requires modeling the history of the sequences with an Ancestral Recombination Graph (ARG) in place of a simple tree, which presents significant computational challenges. RESULTS: ACG is an extensible desktop application that uses a Bayesian Markov chain Monte Carlo procedure to estimate the posterior likelihood of an evolutionary model conditional on an alignment of genetic data. The ancestry of the sequences is represented by an ARG, which is estimated from the data with other model parameters. Importantly, ACG computes the full, Felsenstein likelihood of the ARG, not a pairwise or composite likelihood. Several strategies are used to speed computations, and ACG is roughly 100x faster than a similar, recombination-aware program. CONCLUSIONS: Modeling the ancestry of the sequences with an ARG allows ACG to estimate the evolutionary history of recombining nucleotide sequences. ACG can accurately estimate the posterior distribution of population parameters such as the (scaled) population size and recombination rate, as well as many aspects of the recombinant history, including the positions of recombination breakpoints, the distribution of time to most recent common ancestor along the sequence, and the non-recombining trees at individual sites. Multiple substitution models and population size models are provided. ACG also provides a richly informative graphical interface that allows users to view the evolution of model parameters and likelihoods in real time.


Assuntos
Linhagem , Recombinação Genética , Análise de Sequência de DNA/métodos , Software , Teorema de Bayes , Evolução Molecular , Cadeias de Markov , Modelos Genéticos , Método de Monte Carlo , Densidade Demográfica , Alinhamento de Sequência
9.
BMC Bioinformatics ; 14 Suppl 13: S1, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24268183

RESUMO

BACKGROUND: Identification of the genetic alterations responsible for human disease is a central challenge facing medical genetics. While many algorithms have been developed to predict the degree of damage caused by a given sequence alteration, few tools are able to incorporate information about a given phenotype of interest. METHODS: Here, we describe an algorithm and web-based application which take into account both the probability that a variant damages the function of a gene as well as the relevance of the gene to a given phenotype. Phenotypes are described by a list of scored terms supplied by the user. These terms are then used to search a variety of public databases including NCBI gene summaries, PubMed abstracts, and Gene Ontology terms, and protein-protein interactions in String-DB to determine a relevance score. The overall ranking is determined by the product of the functional damage score and the relevance score, such that highly ranked variants are likely to be damaging and in genes of interest. RESULTS: We demonstrate the method on several test cases including samples with Hereditary Hemorrhagic Telangiectasia (HHT) and Diamond-Blackfan Anemia (DBA). We have also implemented a web-based application which allows public access to the VarRanker algorithm. CONCLUSIONS: Automated searching of public literature and online databases may substantially decrease the amount of time required to identify the mutations underlying human disease. However, several ad-hoc and subjective decisions must be made, and the results of such analyses are likely to depend on the researcher and the state of the literature and databases involved.


Assuntos
Algoritmos , Variação Estrutural do Genoma , Mutação/genética , Fenótipo , Análise de Sequência de Proteína/classificação , Anemia de Diamond-Blackfan/genética , Biologia Computacional , Humanos , Armazenamento e Recuperação da Informação/métodos , Modelos Lineares , Telangiectasia Hemorrágica Hereditária/genética , Vocabulário Controlado
10.
Mol Biol Evol ; 28(11): 3171-81, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21680870

RESUMO

The serial coalescent extends traditional coalescent theory to include genealogies in which not all individuals were sampled at the same time. Inference in this framework is powerful because population size and evolutionary rate may be estimated independently. However, when the sequences in question are affected by selection acting at many sites, the genealogies may differ significantly from their neutral expectation, and inference of demographic parameters may become inaccurate. I demonstrate that this inaccuracy is severe when the mutation rate and strength of selection are jointly large, and I develop a new likelihood calculation that, while approximate, improves the accuracy of population size estimates. When used in a Bayesian parameter estimation context, the new calculation allows for estimation of the shape of the pairwise coalescent rate function and can be used to detect the presence of selection acting at many sites in a sequence. Using the new method, I investigate two sets of dengue virus sequences from Puerto Rico and Thailand, and show that both genealogies are likely to have been distorted by selection.


Assuntos
Evolução Molecular , Genética Populacional/métodos , Modelos Genéticos , Densidade Demográfica , Seleção Genética , Teorema de Bayes , Vírus da Dengue/genética , Funções Verossimilhança , Mutação/genética , Porto Rico , Tailândia
11.
Bioinformatics ; 32(14): 2242, 2016 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-27197814
12.
Mol Biol Evol ; 27(10): 2406-16, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20513741

RESUMO

Accurate reconstruction of the divergence times among individuals is an essential step toward inferring population parameters from genetic data. However, our ability to reconstruct accurate genealogies is often thwarted by the evolutionary forces we hope to detect, most prominently natural selection. Here, I demonstrate that purifying selection acting at many linked sites can systematically bias current methods of genealogical reconstruction, and I present a new method that corrects for this bias by allowing a class of sites to have a time-dependent rate. The parameters influencing the time dependency can be estimated from the data, allowing for a general method to detect the presence of selected sites and correcting for their distortion of the apparent mutation rate. The method works well under a variety of scenarios, including gamma-distributed selection coefficients as well as entirely neutral evolution. I also compare the performance of the new method to relaxed clock models, and I demonstrate the method on a data set from the mitochondrion of the North Atlantic whale-"louse" Cyamus ovalis.


Assuntos
Evolução Molecular , Genética Populacional , Modelos Genéticos , Linhagem , Seleção Genética , Anfípodes/genética , Animais , Simulação por Computador , DNA Mitocondrial/genética , Funções Verossimilhança , Mutação/genética
13.
Mol Biol Evol ; 27(5): 1162-72, 2010 May.
Artigo em Inglês | MEDLINE | ID: mdl-20097659

RESUMO

Coalescent theory provides an elegant and powerful method for understanding the shape of gene genealogies and resulting patterns of genetic diversity. However, the coalescent does not naturally accommodate the effects of heritable variation in fitness. Although some methods are available for studying the effects of strong selection (Ns >> 1), few tools beyond forward simulation are available for quantifying the impact of weak selection at many sites. Here, we introduce a continuous-state coalescent capable of accurately describing the distortions to genealogies caused by moderate to weak natural selection affecting many linked sites. We calculate approximately the full distribution of pairwise coalescent times, the lengths of coalescent intervals, and the time to the most recent common ancestor of a sample. Weak selection (Ns approximately 1) is found to substantially decrease the tree depth, primarily through a shortening of the lengths of the basal coalescent intervals. Additionally, we demonstrate that only two parameters, population size and the variance of the distribution describing fitness heritability, are sufficient to describe most changes.


Assuntos
Biologia Computacional/métodos , Genes/genética , Filogenia , Seleção Genética , Simulação por Computador , Padrões de Herança/genética , Modelos Genéticos , Fatores de Tempo
14.
Bioinformatics ; 26(17): 2200-1, 2010 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-20671150

RESUMO

MOTIVATION: Most population genetic simulators fall into one of two classes, backward time simulators that quickly generate trees but accommodate only relatively simple selective and demographic regimes, and forward simulators that allow for a broader range of evolutionary scenarios but which cannot produce genealogies. Thus, few tools are available that allow for producing genealogies under arbitrarily complex selective and demographic models. RESULTS: TreesimJ is a forward time population genetic simulator that allows for sampling of genealogies, genetic data and many population parameters from populations evolving under complex evolutionary scenarios. The application provides many fitness and demographic models and new models are easy to develop. Data collection is performed by a variety of independently configurable collectors which periodically sample the population and record statistics. Output options include writing traces, histograms and summary statistics from the data collectors in addition to sampled genetic sequences and genealogies. SUMMARY: TreesimJ allows researchers to easily sample and analyze gene genealogies and related data from populations evolving under a wide variety of selective and demographic regimes. It is likely to be useful for population genetic researchers seeking to understand the links between evolutionary and demographic forces, genealogical structure and the resulting patterns of genetic variation. AVAILABILITY: TreesimJ home : http://staff.washington.edu/brendano/treesimj. Source and developer resources: http://code.google.com/p/treesimj.


Assuntos
Simulação por Computador , Genética Populacional/métodos , Modelos Genéticos , Software , Evolução Biológica , Interface Usuário-Computador
15.
J Theor Biol ; 276(1): 150-8, 2011 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-21291893

RESUMO

Pathogen species with high mutation rates are likely to accumulate deleterious mutations that reduce their reproductive potential within the host. By altering the within-host growth rate of the pathogen, the deleterious mutation load has the potential to affect epidemiological properties such as prevalence, mean pathogen load, and the mean duration of infections. Here, I examine an epidemiological model that allows for multiple segregating mutations that affect within-host replication efficiency. The model demonstrates a complex range of outcomes depending on pathogen mutation rate, including two distinct, widely separated mutation rates associated with high pathogen prevalence. The low mutation rate prevalence peak is associated with small amounts of genetic diversity within the pathogen population, relatively stable prevalence and infection dynamics, and genetic variation partitioned between hosts. The high mutation rate peak is characterized by considerable genetic diversity both within and between hosts, relatively frequent invasions by more virulent types, and is qualitatively similar to an RNA virus quasispecies. The two prevalence peaks are separated by a valley where natural selection favors evolution toward the optimal within-host state, which is associated with high virulence and relatively rapid host mortality. Both chronic and acute infections are examined using stochastic forward simulations.


Assuntos
Interações Hospedeiro-Patógeno/genética , Modelos Biológicos , Mutação/genética , Doenças Transmissíveis/epidemiologia , Doenças Transmissíveis/genética , Doenças Transmissíveis/microbiologia , Simulação por Computador , Humanos , Prevalência
16.
Evolution ; 62(2): 361-73, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18070083

RESUMO

Many obligately intracellular symbionts exhibit a characteristic set of genetic changes that include an increase in substitution rates, loss of many genes, and apparent destabilization of many proteins and structural RNAs. Authors have suggested that these changes are due to increased mutation rates, or, more commonly, decreased effective population size due to population bottlenecks at the symbiont or, perhaps, host level. I propose that the increase in substitution rates and accumulation of deleterious mutations is a consequence of the population structure imposed on the endosymbionts by strict host association, loss of horizontal transmission and potentially conflicting levels of selection. I analyze a population genetic model of endosymbiont evolution, and demonstrate that substitution rates will increase, and the effect of those substitutions on endosymbiont fitness will become more deleterious as horizontal transmission among hosts decreases. Additionally, I find that there is a critical level of horizontal transmission below which natural selection cannot effectively purge deleterious mutations, leading to an expected loss of fitness over time. This critical level varies across loci with the degree of correlation between host and endosymbiont fitness, and may help explain differential retention and loss of certain genes.


Assuntos
Evolução Biológica , Evolução Molecular , Genética Populacional , Insetos/microbiologia , Mutação , RNA/química , Seleção Genética , Simbiose , Alelos , Animais , Bactérias/genética , Modelos Genéticos , Modelos Estatísticos , Probabilidade
17.
Proc Biol Sci ; 274(1629): 3159-64, 2007 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-17939983

RESUMO

Most models of quasi-species evolution predict that populations will evolve to occupy areas of sequence space with the greatest concentration of neutral sequences, thus minimizing the deleterious mutation rate and creating mutationally 'robust' genomes. In contrast, empirical studies of the principal model of quasi-species evolution, RNA viruses, suggest that the effects of deleterious mutations are more severe than in similar DNA-based microbes. We demonstrate that populations divided into discrete patches connected by dispersal may favour genotypes where the deleterious effect of non-neutral mutations is maximized. This effect is especially strong in the absence of back mutation and when the amount of time spent in hosts prior to dispersal is intermediate. Our results indicate that RNA viruses that produce acute infections initiated by a small number of virions are expected to evolve fragile genetic architectures when compared with other RNA viruses.


Assuntos
Evolução Biológica , Modelos Biológicos , Mutação , Vírus de RNA/genética , Seleção Genética , Simulação por Computador , Genótipo
18.
Evolution ; 60(3): 448-59, 2006 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-16637490

RESUMO

Sewall Wright's shifting balance theory of evolution posits a mechanism by which a structured population may escape local fitness optima and find a global optimum. We examine a one-locus, two-allele model of underdominance in populations with differing spatial arrangements of demes, both analytically and with Monte Carlo simulations. We find that inclusion of variance in interpatch connectivities can significantly reduce the number of generations required for fixation of the more favorable allele relative to island and stepping-stone models. Although time to fixation increases with migration rate in all cases, the presence of one or two relatively isolated demes may reduce the number of generations by 80% or more. These results suggest that the shifting balance process may operate under less restrictive conditions than those found with a simple spatial arrangement of demes.


Assuntos
Evolução Biológica , Modelos Biológicos , Frequência do Gene , Cadeias de Markov , Dinâmica Populacional
19.
Nat Genet ; 47(3): 235-41, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25665008

RESUMO

Natural variation within species reveals aspects of genome evolution and function. The fission yeast Schizosaccharomyces pombe is an important model for eukaryotic biology, but researchers typically use one standard laboratory strain. To extend the usefulness of this model, we surveyed the genomic and phenotypic variation in 161 natural isolates. We sequenced the genomes of all strains, finding moderate genetic diversity (π = 3 × 10(-3) substitutions/site) and weak global population structure. We estimate that dispersal of S. pombe began during human antiquity (∼340 BCE), and ancestors of these strains reached the Americas at ∼1623 CE. We quantified 74 traits, finding substantial heritable phenotypic diversity. We conducted 223 genome-wide association studies, with 89 traits showing at least one association. The most significant variant for each trait explained 22% of the phenotypic variance on average, with indels having larger effects than SNPs. This analysis represents a rich resource to examine genotype-phenotype relationships in a tractable model.


Assuntos
Genoma Fúngico , Schizosaccharomyces/genética , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
20.
Genetics ; 194(2): 485-92, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23589459

RESUMO

The extent to which selective forces shape patterns of genetic and genealogical variation is unknown in many species. Recent theoretical models have suggested that even relatively weak purifying selection may produce significant distortions in gene genealogies, but few studies have sought to quantify this effect in humans. Here, we employ a reconstruction method based on the ancestral recombination graph to infer genealogies across the length of the human X chromosome and to examine time to most recent common ancestor (TMRCA) and measures of tree imbalance at both broad and very fine scales. In agreement with theory, TMRCA is significantly reduced and genealogies are significantly more imbalanced in coding regions and introns when compared to intergenic regions, and these effects are increased in areas of greater evolutionary constraint. These distortions are present at multiple scales, and chromosomal regions as broad as 5 Mb show a significant negative correlation in TMRCA with exon density. We also show that areas of recent TMRCA are significantly associated with the disease-causing potential of site as measured by the MutationTaster prediction algorithm. Together, these findings suggest that purifying selection has significantly distorted human genealogical structure on both broad and fine scales and that few chromosomal regions escape selection-induced distortions.


Assuntos
Cromossomos Humanos X/genética , Linhagem , Seleção Genética , DNA Intergênico/genética , Ectima Contagioso/genética , Evolução Molecular , Éxons , Humanos , Modelos Genéticos , Recombinação Genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA