RESUMO
Detection of extremely rare variant alleles, such as tumor DNA, within a complex mixture of DNA molecules is experimentally challenging due to sequencing errors. Barcoding of target DNA molecules in library construction for next-generation sequencing provides a way to identify and bioinformatically remove polymerase induced errors. During the barcoding procedure involving t consecutive PCR cycles, the DNA molecules become barcoded by Unique Molecular Identifiers (UMIs). Different library construction protocols utilize different values of t. The effect of a larger t and imperfect PCR amplifications in relation to UMI cluster sizes is poorly described. This paper proposes a branching process with growing immigration as a model describing the random outcome of t cycles of PCR barcoding. Our model discriminates between five different amplification rates r1, r2, r3, r4, r for different types of molecules associated with the PCR barcoding procedure. We study this model by focussing on Ct, the number of clusters of molecules sharing the same UMI, as well as Ct(m), the number of UMI clusters of size m. Our main finding is a remarkable asymptotic pattern valid for moderately large t. It turns out that E(Ct(m))/E(Ct)≈2-m for m=1,2, , regardless of the underlying parameters (r1,r2,r3,r4,r). The knowledge of the quantities Ct and Ct(m) as functions of the experimental parameters t and (r1,r2,r3,r4,r) will help the users to draw more adequate conclusions from the outcomes of different sequencing protocols.
Assuntos
Emigração e Imigração , Sequenciamento de Nucleotídeos em Larga Escala , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Reação em Cadeia da Polimerase/métodos , DNARESUMO
Polyploidy is an important speciation mechanism, particularly in land plants. Allopolyploid species are formed after hybridization between otherwise intersterile parental species. Recent theoretical progress has led to successful implementation of species tree models that take population genetic parameters into account. However, these models have not included allopolyploid hybridization and the special problems imposed when species trees of allopolyploids are inferred. Here, 2 new models for the statistical inference of the evolutionary history of allopolyploids are evaluated using simulations and demonstrated on 2 empirical data sets. It is assumed that there has been a single hybridization event between 2 diploid species resulting in a genomic allotetraploid. The evolutionary history can be represented as a species network or as a multilabeled species tree, in which some pairs of tips are labeled with the same species. In one of the models (AlloppMUL), the multilabeled species tree is inferred directly. This is the simplest model and the most widely applicable, since fewer assumptions are made. The second model (AlloppNET) incorporates the hybridization event explicitly which means that fewer parameters need to be estimated. Both models are implemented in the BEAST framework. Simulations show that both models are useful and that AlloppNET is more accurate if the assumptions it is based on are valid. The models are demonstrated on previously analyzed data from the genera Pachycladon (Brassicaceae) and Silene (Caryophyllaceae).
Assuntos
Brassicaceae/genética , Evolução Molecular , Poliploidia , Silene/genética , Teorema de Bayes , Hibridização Genética , Modelos GenéticosRESUMO
We consider a stochastic process for the generation of species which combines a Yule process with a simple model for hybridization between pairs of co-existent species. We assume that the origin of the process, when there was one species, occurred at an unknown time in the past, and we condition the process on producing n species via the Yule process and a single hybridization event. We prove results about the distribution of the time of the hybridization event. In particular we calculate a formula for all moments and show that under various conditions, the distribution tends to an exponential with rate twice that of the birth rate for the Yule process.