Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
PLoS Comput Biol ; 18(8): e1010407, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35921376

RESUMO

Estimating the mutation rate, or equivalently effective population size, is a common task in population genetics. If recombination is low or high, optimal linear estimation methods are known and well understood. For intermediate recombination rates, the calculation of optimal estimators is more challenging. As an alternative to model-based estimation, neural networks and other machine learning tools could help to develop good estimators in these involved scenarios. However, if no benchmark is available it is difficult to assess how well suited these tools are for different applications in population genetics. Here we investigate feedforward neural networks for the estimation of the mutation rate based on the site frequency spectrum and compare their performance with model-based estimators. For this we use the model-based estimators introduced by Fu, Futschik et al., and Watterson that minimize the variance or mean squared error for no and free recombination. We find that neural networks reproduce these estimators if provided with the appropriate features and training sets. Remarkably, using the model-based estimators to adjust the weights of the training data, only one hidden layer is necessary to obtain a single estimator that performs almost as well as model-based estimators for low and high recombination rates, and at the same time provides a superior estimation method for intermediate recombination rates. We apply the method to simulated data based on the human chromosome 2 recombination map, highlighting its robustness in a realistic setting where local recombination rates vary and/or are unknown.


Assuntos
Genética Populacional , Taxa de Mutação , Simulação por Computador , Humanos , Redes Neurais de Computação , Recombinação Genética/genética
2.
Bioinformatics ; 37(18): 3061-3063, 2021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-33738486

RESUMO

MOTIVATION: When performing genome-wide association studies conventionally the additive genetic model is used to explore whether a single nucleotide polymorphism (SNP) is associated with a quantitative trait. But for variants, which do not follow an intermediate mode of inheritance (MOI), the recessive or the dominant genetic model can have more power to detect associations and furthermore the MOI is important for downstream analyses and clinical interpretation. When multiple MOIs are modelled the question arises, which describes the true underlying MOI best. RESULTS: We developed an R-package allowing for the first time to determine study specific critical values when one of the three models is more informative than the other ones for a quantitative trait locus. The package allows for user-friendly simulations to determine these critical values with predefined minor allele frequencies and study sizes. For application scenarios with extensive multiple testing we integrated an interpolation functionality to determine critical values already based on a moderate number of random draws. AVAILABILITY AND IMPLEMENTATION: The R-package pgainsim is freely available for download on Github at https://github.com/genepi-freiburg/pgainsim. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Fenótipo , Padrões de Herança , Polimorfismo de Nucleotídeo Único , Software
3.
Theor Popul Biol ; 148: 28-39, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36208800

RESUMO

The concept of individual admixture (IA) assumes that the genome of individuals is composed of alleles inherited from K ancestral populations. Each copy of each allele has the same chance qk to originate from population k, and together with the allele frequencies p in all populations at all M markers, comprises the admixture model. Here, we assume a supervised scheme, i.e. allele frequencies p are given through a reference database of size N, and q is estimated via maximum likelihood for a single sample. We study laws of large numbers and central limit theorems describing effects of finiteness of both, M and N, on the estimate of q. We recall results for the effect of finite M, and provide a central limit theorem for the effect of finite N, introduce a new way to express the uncertainty in estimates in standard barplots, give simulation results, and discuss applications in forensic genetics.


Assuntos
Genética Populacional , Simulação por Computador , Frequência do Gene , Funções Verossimilhança , Incerteza
5.
Bioinformatics ; 35(11): 1813-1819, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-30395202

RESUMO

MOTIVATION: Unique sequence regions are associated with genetic function in vertebrate genomes. However, measuring uniqueness, or absence of long repeats, along a genome is conceptually and computationally difficult. Here we use a variant of the Lempel-Ziv complexity, the match complexity, Cm, and augment it by deriving its null distribution for random sequences. We then apply Cm to the human and mouse genomes to investigate the relationship between sequence complexity and function. RESULTS: We implemented Cm in the program macle and show through simulation that the newly derived null distribution of Cm is accurate. This allows us to delineate high-complexity regions in the human and mouse genomes. Using our program macle2go, we find that these regions are twofold enriched for genes. Moreover, the genes contained in these regions are more than 10-fold enriched for developmental functions. AVAILABILITY AND IMPLEMENTATION: Source code for macle and macle2go is available from www.github.com/evolbioinf/macle and www.github.com/evolbioinf/macle2go, respectively; Cm browser tracks from guanine.evolbio.mgp.de/complexity. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Genômica , Animais , Genes Controladores do Desenvolvimento , Humanos , Mamíferos , Camundongos , Software
6.
Theor Popul Biol ; 131: 2-11, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31759974

RESUMO

For a panmictic population of constant size evolving under neutrality, Kingman's coalescent describes the genealogy of a population sample in equilibrium. However, for genealogical trees under selection, not even expectations for most basic quantities like height and length of the resulting random tree are known. Here, we give an analytic expression for the distribution of the total tree length of a sample of size n under low levels of selection in a two-alleles model. We can prove that trees are shorter than under neutrality under genic selection and if the beneficial mutant has dominance h<1∕2, but longer for h>1∕2. The difference from neutrality is O(α2) for genic selection with selection intensity α and O(α) for other modes of dominance.


Assuntos
Alelos , Genética Populacional , Modelos Genéticos , Seleção Genética , Linhagem
7.
J Math Biol ; 77(4): 1153-1191, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-29797051

RESUMO

Gene expression is influenced by extrinsic noise (involving a fluctuating environment of cellular processes) and intrinsic noise (referring to fluctuations within a cell under constant environment). We study the standard model of gene expression including an (in-)active gene, mRNA and protein. Gene expression is regulated in the sense that the protein feeds back and either represses (negative feedback) or enhances (positive feedback) its production at the stage of transcription. While it is well-known that negative (positive) feedback reduces (increases) intrinsic noise, we give a precise result on the resulting fluctuations in protein numbers. The technique we use is an extension of the Langevin approximation and is an application of a central limit theorem under stochastic averaging for Markov jump processes (Kang et al. in Ann Appl Probab 24:721-759, 2014). We find that (under our scaling and in equilibrium), negative feedback leads to a reduction in the Fano factor of at most 2, while the noise under positive feedback is potentially unbounded. The fit with simulations is very good and improves on known approximations.


Assuntos
Regulação da Expressão Gênica , Modelos Genéticos , Fenômenos Bioquímicos , Simulação por Computador , Retroalimentação Fisiológica , Homeostase/genética , Cadeias de Markov , Conceitos Matemáticos , Método de Monte Carlo , Biossíntese de Proteínas , RNA Mensageiro/genética , Processos Estocásticos , Transcrição Gênica
9.
J Lipid Res ; 57(5): 882-93, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27015744

RESUMO

Lipoproteins play a key role in the development of CVD, but the dynamics of lipoprotein metabolism are difficult to address experimentally. This article describes a novel two-step combined in vitro and in silico approach that enables the estimation of key reactions in lipoprotein metabolism using just one blood sample. Lipoproteins were isolated by ultracentrifugation from fasting plasma stored at 4°C. Plasma incubated at 37°C is no longer in a steady state, and changes in composition may be determined. From these changes, we estimated rates for reactions like LCAT (56.3 µM/h), ß-LCAT (15.62 µM/h), and cholesteryl ester (CE) transfer protein-mediated flux of CE from HDL to IDL/VLDL (21.5 µM/h) based on data from 15 healthy individuals. In a second step, we estimated LDL's HL activity (3.19 pools/day) and, for the very first time, selective CE efflux from LDL (8.39 µM/h) by relying on the previously derived reaction rates. The estimated metabolic rates were then confirmed in an independent group (n = 10). Although measurement uncertainties do not permit us to estimate parameters in individuals, the novel approach we describe here offers the unique possibility to investigate lipoprotein dynamics in various diseases like atherosclerosis or diabetes.


Assuntos
Lipoproteínas LDL/sangue , Adulto , Algoritmos , Proteínas de Transferência de Ésteres de Colesterol/fisiologia , Simulação por Computador , Esterificação , Feminino , Humanos , Hidrólise , Masculino , Pessoa de Meia-Idade , Modelos Biológicos , Fosfatidilcolina-Esterol O-Aciltransferase/fisiologia , Triglicerídeos/fisiologia , Adulto Jovem
10.
Bioinformatics ; 31(8): 1169-75, 2015 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-25504847

RESUMO

MOTIVATION: A standard approach to classifying sets of genomes is to calculate their pairwise distances. This is difficult for large samples. We have therefore developed an algorithm for rapidly computing the evolutionary distances between closely related genomes. RESULTS: Our distance measure is based on ungapped local alignments that we anchor through pairs of maximal unique matches of a minimum length. These exact matches can be looked up efficiently using enhanced suffix arrays and our implementation requires approximately only 1 s and 45 MB RAM/Mbase analysed. The pairing of matches distinguishes non-homologous from homologous regions leading to accurate distance estimation. We show this by analysing simulated data and genome samples ranging from 29 Escherichia coli/Shigella genomes to 3085 genomes of Streptococcus pneumoniae. AVAILABILITY AND IMPLEMENTATION: We have implemented the computation of anchor distances in the multithreaded UNIX command-line program andi for ANchor DIstances. C sources and documentation are posted at http://github.com/evolbioinf/andi/ CONTACT: haubold@evolbio.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Evolução Biológica , Genoma , Genômica/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Bases de Dados Genéticas , Humanos , Filogenia
11.
J Theor Biol ; 364: 355-63, 2015 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-25285895

RESUMO

The expression of genes usually follows a two-step procedure. First, a gene (encoded in the genome) is transcribed resulting in a strand of (messenger) RNA. Afterwards, the RNA is translated into protein. We extend the classical stochastic jump model by adding delays (with arbitrary distributions) to transcription and translation. Already in the classical model, production of RNA and protein comes in bursts by activation and deactivation of the gene, resulting in a large variance of the number of RNA and proteins in equilibrium. We derive precise formulas for this second-order structure with the model including delay in equilibrium.


Assuntos
Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Processos Estocásticos , Animais , Bactérias , Sítios de Ligação , Simulação por Computador , Cadeias de Markov , Modelos Genéticos , Oscilometria , Distribuição de Poisson , Biossíntese de Proteínas , Proteínas/química , RNA/química , Transcrição Gênica
12.
Bioinformatics ; 29(24): 3121-7, 2013 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-24064419

RESUMO

MOTIVATION: Why recombination? is one of the central questions in biology. This has led to a host of methods for quantifying recombination from sequence data. These methods are usually based on aligned DNA sequences. Here, we propose an efficient alignment-free alternative. RESULTS: Our method is based on the distribution of match lengths, which we look up using enhanced suffix arrays. By eliminating the alignment step, the test becomes fast enough for application to whole bacterial genomes. Using simulations we show that our test has similar power as established tests when applied to long pairs of sequences. When applied to 58 genomes of Escherichia coli, we pick up the strongest recombination signal from a 125 kb horizontal gene transfer engineered 20 years ago. AVAILABILITY AND IMPLEMENTATION: We have implemented our method in the command-line program rush. Its C sources and documentation are available under the GNU General Public License from http://guanine.evolbio.mpg.de/rush/.


Assuntos
Algoritmos , Biologia Computacional , Genoma Bacteriano , Recombinação Genética , Alinhamento de Sequência/métodos , Simulação por Computador , Escherichia coli/genética , Filogenia
13.
Theor Popul Biol ; 87: 25-33, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23064041

RESUMO

The ancestral selection graph (ASG) was introduced by  Neuhauser and Krone (1997) in order to study populations of constant size which evolve under selection. Coalescence events, which occur at rate 1 for every pair of lines, lead to joint ancestry. In addition, splitting events in the ASG at rate α, the scaled selection coefficient, produce possible ancestors, such that the real ancestor depends on the ancestral alleles. Here, we use the ASG in the case without mutation in order to study fixation of a beneficial mutant. Using our main tool, a reversibility property of the ASG, we provide a new proof of the fact that a beneficial allele fixes roughly in time (2logα)/α if α is large.


Assuntos
Modelos Teóricos , Seleção Genética , Mutação
14.
Theor Popul Biol ; 90: 1-11, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24051161

RESUMO

Beneficial mutations can co-occur when population structure slows down adaptation. Here, we consider the process of adaptation in asexual populations distributed over several locations ("islands"). New beneficial mutations arise at constant rate ub, and each mutation has the same selective advantage s>0. We assume that populations evolve within islands according to the successional mutations regime of Desai and Fisher (2007), that is, the time to local fixation of a mutation is short compared to the expected waiting time until the next mutation occurs. To study the rate of adaptation, we introduce an approximate model, the successional mutations (SM) model, which can be simulated efficiently and yields accurate results for a wide range of parameters. In the SM model, mutations fix instantly within islands, and migrants can take over the destination island if they are fitter than the residents. For the special case of a population distributed equally across two islands with population size N, we approximate the model further for small and large migration rates in comparison to the mutation rate. These approximations lead to explicit formulas for the rate of adaptation which fit the original model for a large range of parameter values. For the d island case we provide some heuristics on how to extend the explicit formulas and check these with computer simulations. We conclude that the SM model is a good approximation of the adaptation process in a structured population, at least if mutation or migration is limited.


Assuntos
Adaptação Fisiológica , Dinâmica Populacional , Modelos Teóricos , Mutação
15.
Bioinformatics ; 27(4): 449-55, 2011 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-21156730

RESUMO

MOTIVATION: Sequencing capacity is currently growing more rapidly than CPU speed, leading to an analysis bottleneck in many genome projects. Alignment-free sequence analysis methods tend to be more efficient than their alignment-based counterparts. They may, therefore, be important in the long run for keeping sequence analysis abreast with sequencing. RESULTS: We derive and implement an alignment-free estimator of the number of pairwise mismatches, . Our implementation of , pim, is based on an enhanced suffix array and inherits the superior time and memory efficiency of this data structure. Simulations demonstrate that is accurate if mutations are distributed randomly along the chromosome. While real data often deviates from this ideal, remains useful for identifying regions of low genetic diversity using a sliding window approach. We demonstrate this by applying it to the complete genomes of 37 strains of Drosophila melanogaster, and to the genomes of two closely related Drosophila species, D.simulans and D.sechellia. In both cases, we detect the diversity minimum and discuss its biological implications.


Assuntos
Biologia Computacional/métodos , Variação Genética , Análise de Sequência de DNA/métodos , Algoritmos , Animais , Simulação por Computador , Drosophila/genética , Genoma de Inseto , Recombinação Genética
16.
Forensic Sci Int Genet ; 56: 102593, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34735936

RESUMO

The inference of biogeographic ancestry (BGA) has become a focus of forensic genetics. Misinference of BGA can have profound unwanted consequences for investigations and society. We show that recent admixture can lead to misclassification and erroneous inference of ancestry proportions, using state of the art analysis tools with (i) simulations, (ii) 1000 genomes project data, and (iii) two individuals analyzed using the ForenSeq DNA Signature Prep Kit. Subsequently, we extend existing tools for estimation of individual ancestry (IA) by allowing for different IA in both parents, leading to estimates of parental individual ancestry (PIA), and a statistical test for recent admixture. Estimation of PIA outperforms IA in most scenarios of recent admixture. Furthermore, additional information about parental ancestry can be acquired with PIA that may guide casework.


Assuntos
Genética Populacional , Polimorfismo de Nucleotídeo Único , Genótipo , Humanos
17.
Genetics ; 182(1): 205-16, 2009 May.
Artigo em Inglês | MEDLINE | ID: mdl-19237689

RESUMO

Using coalescent simulations, we study the impact of three different sampling schemes on patterns of neutral diversity in structured populations. Specifically, we are interested in two summary statistics based on the site frequency spectrum as a function of migration rate, demographic history of the entire substructured population (including timing and magnitude of specieswide expansions), and the sampling scheme. Using simulations implementing both finite-island and two-dimensional stepping-stone spatial structure, we demonstrate strong effects of the sampling scheme on Tajima's D (D(T)) and Fu and Li's D (D(FL)) statistics, particularly under specieswide (range) expansions. Pooled samples yield average D(T) and D(FL) values that are generally intermediate between those of local and scattered samples. Local samples (and to a lesser extent, pooled samples) are influenced by local, rapid coalescence events in the underlying coalescent process. These processes result in lower proportions of external branch lengths and hence lower proportions of singletons, explaining our finding that the sampling scheme affects D(FL) more than it does D(T). Under specieswide expansion scenarios, these effects of spatial sampling may persist up to very high levels of gene flow (Nm > 25), implying that local samples cannot be regarded as being drawn from a panmictic population. Importantly, many data sets on humans, Drosophila, and plants contain signatures of specieswide expansions and effects of sampling scheme that are predicted by our simulation results. This suggests that validating the assumption of panmixia is crucial if robust demographic inferences are to be made from local or pooled samples. However, future studies should consider adopting a framework that explicitly accounts for the genealogical effects of population subdivision and empirical sampling schemes.


Assuntos
Drosophila melanogaster , Variação Genética , Genética Populacional , Desequilíbrio de Ligação , Modelos Genéticos , Solanum lycopersicum , Animais , Simulação por Computador , Demografia , Drosophila melanogaster/genética , Solanum lycopersicum/genética , Humanos
18.
Mol Ecol ; 19 Suppl 1: 277-84, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20331786

RESUMO

Improvements in sequencing technology over the past 5 years are leading to routine application of shotgun sequencing in the fields of ecology and evolution. However, the theory to estimate evolutionary parameters from these data is still being worked out. Here we present an extension and implementation of part of this theory, mlRho. This program can efficiently compute the following three maximum likelihood estimators based on shotgun sequence data obtained from single diploid individuals: the population mutation rate (4N(e)mu), the sequencing error rate, and the population recombination rate (4N(e)c). We demonstrate the accuracy of mlRho by applying it to simulated data sets. In addition, we analyse the genomes of the sea squirt Ciona intestinalis and the water flea Daphnia pulex. Ciona intestinalis is an obligate outcrosser, while D. pulex is a cyclic parthenogen, and we discuss how these contrasting life histories are reflected in our parameter estimates. The program mlRho is freely available from http://guanine.evolbio.mpg.de/mlRho.


Assuntos
Análise Mutacional de DNA/métodos , Genética Populacional , Genômica/métodos , Recombinação Genética , Software , Animais , Ciona intestinalis/genética , Biologia Computacional/métodos , Simulação por Computador , Daphnia/genética , Diploide , Genoma , Funções Verossimilhança , Modelos Genéticos
19.
G3 (Bethesda) ; 10(1): 211-223, 2020 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-31699776

RESUMO

With up to millions of nearly neutral polymorphisms now being routinely sampled in population-genomic surveys, it is possible to estimate the site-frequency spectrum of such sites with high precision. Each frequency class reflects a mixture of potentially unique demographic histories, which can be revealed using theory for the probability distributions of the starting and ending points of branch segments over all possible coalescence trees. Such distributions are completely independent of past population history, which only influences the segment lengths, providing the basis for estimating average population sizes separating tree-wide coalescence events. The history of population-size change experienced by a sample of polymorphisms can then be dissected in a model-flexible fashion, and extension of this theory allows estimation of the mean and full distribution of long-term effective population sizes and ages of alleles of specific frequencies. Here, we outline the basic theory underlying the conceptual approach, develop and test an efficient statistical procedure for parameter estimation, and apply this to multiple population-genomic datasets for the microcrustacean Daphnia pulex.


Assuntos
Biomassa , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Animais , Daphnia/genética , Daphnia/crescimento & desenvolvimento
20.
Forensic Sci Int Genet ; 46: 102259, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32105949

RESUMO

Inference of the Biogeographical Ancestry (BGA) of a person or trace relies on three ingredients: (1) a reference database of DNA samples including BGA information; (2) a statistical clustering method; (3) a set of loci which segregate dependent on geographical location, i.e. a set of so-called Ancestry Informative Markers (AIMs). We used the theory of feature selection from statistical learning in order to obtain AIMsets for BGA inference. Using simulations, we show that this learning procedure works in various cases, and outperforms ad hoc methods, based on statistics like FST or informativeness for the choice of AIMs. Applying our method to data from the 1000 genomes project (excluding Admixed Americans) we identified an AIMset of 12 SNPs, which gives a vanishing misclassification error on a continental scale, as do other published AIMsets. In fact, cross validation shows that there exists a multitude of sets with comparable performance to the optimal AIMset. On a sub-continental scale, we find a set of 55 SNPs for distinguishing the five European populations. The misclassification error is reduced by a factor of two relative to published AIMsets, but is still 30% and therefore too large in order to be useful in forensic applications.


Assuntos
Bases de Dados Genéticas , Marcadores Genéticos , Polimorfismo de Nucleotídeo Único , Grupos Raciais/genética , Genética Forense , Humanos , Modelos Genéticos , Modelos Estatísticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA