Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
ArXiv ; 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38699164

RESUMO

Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms. Here we propose a new method for inferring the probability distribution from which a sample of biological sequences were drawn for the case where the sequences are composed of elements that admit a natural ordering. Our method is based on Bayesian field theory, a physics-based machine learning approach, and can be regarded as a nonparametric extension of the traditional maximum entropy estimate. As an example, we use it to analyze the aneuploidy data pertaining to gliomas from The Cancer Genome Atlas project. In addition, we demonstrate two follow-up analyses that can be performed with the resulting probability distribution. One of them is to investigate the associations among the sequence sites. This provides us a way to infer the governing biological grammar. The other is to study the global geometry of the probability landscape, which allows us to look at the problem from an evolutionary point of view. It can be seen that this methodology enables us to learn from a sample of sequences about how a biological system or phenomenon in the real world works.

2.
bioRxiv ; 2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38798671

RESUMO

Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.

3.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36355460

RESUMO

MOTIVATION: Multiple sequence alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-processing step, without any guidance from the application it will be used for. RESULTS: Here, we implement a smooth and differentiable version of the Smith-Waterman pairwise alignment algorithm that enables jointly learning an MSA and a downstream machine learning system in an end-to-end fashion. To demonstrate its utility, we introduce SMURF (Smooth Markov Unaligned Random Field), a new method that jointly learns an alignment and the parameters of a Markov Random Field for unsupervised contact prediction. We find that SMURF learns MSAs that mildly improve contact prediction on a diverse set of protein and RNA families. As a proof of concept, we demonstrate that by connecting our differentiable alignment module to AlphaFold2 and maximizing predicted confidence, we can learn MSAs that improve structure predictions over the initial MSAs. Interestingly, the alignments that improve AlphaFold predictions are self-inconsistent and can be viewed as adversarial. This work highlights the potential of differentiable dynamic programming to improve neural network pipelines that rely on an alignment and the potential dangers of optimizing predictions of protein sequences with methods that are not fully understood. AVAILABILITY AND IMPLEMENTATION: Our code and examples are available at: https://github.com/spetti/SMURF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Proteínas , Humanos , Alinhamento de Sequência , Proteínas/química , Redes Neurais de Computação , Sequência de Aminoácidos
4.
Proc Natl Acad Sci U S A ; 119(39): e2204233119, 2022 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-36129941

RESUMO

Contemporary high-throughput mutagenesis experiments are providing an increasingly detailed view of the complex patterns of genetic interaction that occur between multiple mutations within a single protein or regulatory element. By simultaneously measuring the effects of thousands of combinations of mutations, these experiments have revealed that the genotype-phenotype relationship typically reflects not only genetic interactions between pairs of sites but also higher-order interactions among larger numbers of sites. However, modeling and understanding these higher-order interactions remains challenging. Here we present a method for reconstructing sequence-to-function mappings from partially observed data that can accommodate all orders of genetic interaction. The main idea is to make predictions for unobserved genotypes that match the type and extent of epistasis found in the observed data. This information on the type and extent of epistasis can be extracted by considering how phenotypic correlations change as a function of mutational distance, which is equivalent to estimating the fraction of phenotypic variance due to each order of genetic interaction (additive, pairwise, three-way, etc.). Using these estimated variance components, we then define an empirical Bayes prior that in expectation matches the observed pattern of epistasis and reconstruct the genotype-phenotype mapping by conducting Gaussian process regression under this prior. To demonstrate the power of this approach, we present an application to the antibody-binding domain GB1 and also provide a detailed exploration of a dataset consisting of high-throughput measurements for the splicing efficiency of human pre-mRNA [Formula: see text] splice sites, for which we also validate our model predictions via additional low-throughput experiments.


Assuntos
Epistasia Genética , Precursores de RNA , Teorema de Bayes , Mapeamento Cromossômico , Biologia Computacional , Genótipo , Humanos , Modelos Genéticos , Mutação , Fenótipo , Splicing de RNA
5.
Proc Natl Acad Sci U S A ; 118(40)2021 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-34599093

RESUMO

Density estimation in sequence space is a fundamental problem in machine learning that is also of great importance in computational biology. Due to the discrete nature and large dimensionality of sequence space, how best to estimate such probability distributions from a sample of observed sequences remains unclear. One common strategy for addressing this problem is to estimate the probability distribution using maximum entropy (i.e., calculating point estimates for some set of correlations based on the observed sequences and predicting the probability distribution that is as uniform as possible while still matching these point estimates). Building on recent advances in Bayesian field-theoretic density estimation, we present a generalization of this maximum entropy approach that provides greater expressivity in regions of sequence space where data are plentiful while still maintaining a conservative maximum entropy character in regions of sequence space where data are sparse or absent. In particular, we define a family of priors for probability distributions over sequence space with a single hyperparameter that controls the expected magnitude of higher-order correlations. This family of priors then results in a corresponding one-dimensional family of maximum a posteriori estimates that interpolate smoothly between the maximum entropy estimate and the observed sample frequencies. To demonstrate the power of this method, we use it to explore the high-dimensional geometry of the distribution of 5' splice sites found in the human genome and to understand patterns of chromosomal abnormalities across human cancers.


Assuntos
Aneuploidia , Biologia Computacional/métodos , Modelos Teóricos , Neoplasias/genética , Sítios de Splice de RNA , Humanos , Probabilidade
6.
Nat Commun ; 11(1): 1782, 2020 04 14.
Artigo em Inglês | MEDLINE | ID: mdl-32286265

RESUMO

Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction where mutational effects change as little as possible across adjacent genetic backgrounds. The resulting models can capture complex higher-order genetic interactions near the data, but approach additivity where data is sparse or absent. We apply the method to high-throughput transcription factor binding assays and use it to explore a fitness landscape for protein G.


Assuntos
Epistasia Genética/genética , Modelos Teóricos , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Biologia Computacional/métodos , Genótipo , Análise dos Mínimos Quadrados
7.
Evolution ; 74(7): 1321-1334, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32202323

RESUMO

Conflicting selection is an important evolutionary mechanism because it impedes directional evolution and helps to maintain phenotypic variation. It can arise when mutualistic and antagonistic selective agents exert opposing selection on the same trait and when distinct phenotypic optima are favored by different fitness components. In this study, we test for conflicting selection through different sexual functions of the hermaphroditic plant, Silene stellata during its early and late flowering season. We find selection is consistently stronger during the early flowering season, which aligns with the activity peak of the pollinating seed predator Hadena ectypa. Importantly, we observe sex-specific selection on petal dimensions to have opposite signs. We propose that the observed sexually conflicting selection on petal design results from the negative selection through female function for the avoidance of oviposition and the subsequent fruit predation by H. ectypa larvae and the positive selection through male function for pollen export by H. ectypa adults. The Silene-Hadena interaction has previously been considered to be largely parasitic. Our findings suggest a trade-off mechanism that could thwart the evolution of an "escape route" from the nocturnal pollination syndrome by Silene spp. and contribute to the long-term maintenance of the Silene-Hadena system.


Assuntos
Polinização , Seleção Genética , Silene/genética , Animais , Flores/fisiologia , Mariposas , Oviposição
8.
Am J Bot ; 105(10): 1643-1652, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30276803

RESUMO

PREMISE OF THE STUDY: Nursery pollination systems can range from obligate to facultative. In a system where generalists provide substantial pollination service, an important question is whether the cost of seed predation outweighs the benefit provided by the nursery pollinator to cause the plant to evolve toward more generalized pollination. Using a facultative system native to North America, we tested whether nursery pollinator vs. strictly mutualistic generalists affect mating-system parameters of the host plant and explored the implications for long-term coevolution. METHODS: We used paternity analyses with 11 microsatellite markers to characterize the mating system of Silene stellata when pollination service is primarily through the nursery pollinator Hadena ectypa and generalist moths. KEY RESULTS: Our experimental population of S. stellata was predominantly outcrossing (average outcrossing rate t = 0.83), and mating-system parameters were similar between pollinator groups. We detected significant correlations in both selfing and outcrossed paternity at the fruit and maternal family level, corresponding to limited pollen dispersal (mean = 3.9 m). Among individuals, variation in anther-stigma separation was positively associated with outcrossing rate, which suggests the importance of herkogamy in preventing selfing. CONCLUSIONS: Correlated paternity suggests that seeds from the same fruit and/or plants are sired by a limited number of pollen donors, resulting from low pollen dispersal and potential male-male competition. The similar mating-system parameters of the two pollinator groups suggest that selection for higher outcrossing in S. stellata is likely to be through floral design rather than through increased pollinator specialization with H. ectypa.


Assuntos
Polinização , Silene/fisiologia , Reprodução , Reprodução Assexuada , Silene/genética , Tetraploidia , Virginia
9.
Genes (Basel) ; 9(9)2018 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-30134605

RESUMO

A now classical argument for the marginal thermodynamic stability of proteins explains the distribution of observed protein stabilities as a consequence of an entropic pull in protein sequence space. In particular, most sequences that are sufficiently stable to fold will have stabilities near the folding threshold. Here, we extend this argument to consider its predictions for epistatic interactions for the effects of mutations on the free energy of folding. Although there is abundant evidence to indicate that the effects of mutations on the free energy of folding are nearly additive and conserved over evolutionary time, we show that these observations are compatible with the hypothesis that a non-additive contribution to the folding free energy is essential for observed proteins to maintain their native structure. In particular, through both simulations and analytical results, we show that even very small departures from additivity are sufficient to drive this effect.

10.
Ann Bot ; 122(4): 593-603, 2018 09 24.
Artigo em Inglês | MEDLINE | ID: mdl-29850821

RESUMO

Background and Aims: Population genetic structures and patterns of gene flow of interacting species provide important insights into the spatial scale of their interactions and the potential for local co-adaptation. We analysed the genetic structures of the plant Silene stellata and the nocturnal moth Hadena ectypa. Hadena ectypa acts as one of the important pollinators of S. stellata as well as being an obligate seed parasite on the plant. Although H. ectypa provides a substantial pollination service to S. stellata, this system is largely considered parasitic due to the severe seed predation by the Hadena larvae. Previous research on this system has found variable interaction outcomes across space, indicating the potential for a geographical selection mosaic. Methods: Using 11 microsatellite markers for S. stellata and nine markers for H. ectypa, we analysed the population genetic structure and the patterns and intensity of gene flow within and among three local populations in the Appalachians. Key Results: We found no spatial genetic structure in the moth populations, while significant differentiation was detected among the local plant populations. Additionally, we observed that gene flow rates among H. ectypa populations were more uniform and that the mean gene flow rate in H. ectypa was twice as large as that in S. stellata. Conclusions: Our results suggest that although the moths move frequently among populations, long-distance pollen carryover only happens occasionally. The difference in gene flow rates between S. stellata and H. ectypa could prevent strict local co-adaptation. Furthermore, higher gene flow rates in H. ectypa could also increase resistance of the local S. stellata populations to the parasitic effect of H. ectypa and therefore help to stabilize the Silene-Hadena interaction dynamics.


Assuntos
Fluxo Gênico , Genética Populacional , Interações Hospedeiro-Parasita , Mariposas/fisiologia , Silene/genética , Animais , Repetições de Microssatélites/genética , Mariposas/genética , Pólen/genética , Pólen/parasitologia , Polinização , Sementes/genética , Sementes/parasitologia , Silene/parasitologia
11.
Appl Plant Sci ; 4(12)2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28101439

RESUMO

PREMISE OF THE STUDY: We designed and tested microsatellite markers for the North American native species Silene stellata (Caryophyllaceae) to investigate its population genetic structure and identify selection on floral design through male reproductive success. METHODS AND RESULTS: A total of 153 candidate microsatellite loci were isolated based on next-generation sequencing. We identified 18 polymorphic microsatellite loci in three populations of S. stellata, with di- or trinucleotide repeats. Genotyping results showed the number of alleles per locus ranged from six to 45 and expected heterozygosity ranged from 0.511 to 0.951. Five of these loci were successfully amplified in S. virginica and S. caroliniana and were also polymorphic. CONCLUSIONS: The microsatellite markers reported here provide a valuable tool for paternity analysis in S. stellata. They will also be useful for investigating the population genetic structures of S. stellata and related species.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA