Búsqueda | Portal de Búsqueda de la BVS Ecuador

Interpreting the pervasive observation of U-shaped Site Frequency Spectra.

Freund, Fabian; Kerdoncuff, Elise; Matuszewski, Sebastian; Lapierre, Marguerite; Hildebrandt, Marcel; Jensen, Jeffrey D; Ferretti, Luca; Lambert, Amaury; Sackton, Timothy B; Achaz, Guillaume.

PLoS Genet ; 19(3): e1010677, 2023 03.

Artículo en Inglés | MEDLINE | ID: mdl-36952570

RESUMEN

The standard neutral model of molecular evolution has traditionally been used as the null model for population genomics. We gathered a collection of 45 genome-wide site frequency spectra from a diverse set of species, most of which display an excess of low and high frequency variants compared to the expectation of the standard neutral model, resulting in U-shaped spectra. We show that multiple merger coalescent models often provide a better fit to these observations than the standard Kingman coalescent. Hence, in many circumstances these under-utilized models may serve as the more appropriate reference for genomic analyses. We further discuss the underlying evolutionary processes that may result in the widespread U-shape of frequency spectra.

Asunto(s)

Evolución Biológica , Evolución Molecular , Modelos Genéticos

Task-driven knowledge graph filtering improves prioritizing drugs for repurposing.

Ratajczak, Florin; Joblin, Mitchell; Ringsquandl, Martin; Hildebrandt, Marcel.

BMC Bioinformatics ; 23(1): 84, 2022 Mar 04.

Artículo en Inglés | MEDLINE | ID: mdl-35246025

RESUMEN

BACKGROUND: Drug repurposing aims at finding new targets for already developed drugs. It becomes more relevant as the cost of discovering new drugs steadily increases. To find new potential targets for a drug, an abundance of methods and existing biomedical knowledge from different domains can be leveraged. Recently, knowledge graphs have emerged in the biomedical domain that integrate information about genes, drugs, diseases and other biological domains. Knowledge graphs can be used to predict new connections between compounds and diseases, leveraging the interconnected biomedical data around them. While real world use cases such as drug repurposing are only interested in one specific relation type, widely used knowledge graph embedding models simultaneously optimize over all relation types in the graph. This can lead the models to underfit the data that is most relevant for the desired relation type. For example, if we want to learn embeddings to predict links between compounds and diseases but almost the entirety of relations in the graph is incident to other pairs of entity types, then the resulting embeddings are likely not optimised to predict links between compounds and diseases. We propose a method that leverages domain knowledge in the form of metapaths and use them to filter two biomedical knowledge graphs (Hetionet and DRKG) for the purpose of improving performance on the prediction task of drug repurposing while simultaneously increasing computational efficiency. RESULTS: We find that our method reduces the number of entities by 60% on Hetionet and 26% on DRKG, while leading to an improvement in prediction performance of up to 40.8% on Hetionet and 14.2% on DRKG, with an average improvement of 20.6% on Hetionet and 8.9% on DRKG. Additionally, prioritization of antiviral compounds for SARS CoV-2 improves after task-driven filtering is applied. CONCLUSION: Knowledge graphs contain facts that are counter productive for specific tasks, in our case drug repurposing. We also demonstrate that these facts can be removed, resulting in an improved performance in that task and a more efficient learning process.

Asunto(s)

COVID-19 , Reconocimiento de Normas Patrones Automatizadas , Algoritmos , Reposicionamiento de Medicamentos , Humanos , SARS-CoV-2

Speos: an ensemble graph representation learning framework to predict core gene candidates for complex diseases.

Ratajczak, Florin; Joblin, Mitchell; Hildebrandt, Marcel; Ringsquandl, Martin; Falter-Braun, Pascal; Heinig, Matthias.

Nat Commun ; 14(1): 7206, 2023 11 08.

Artículo en Inglés | MEDLINE | ID: mdl-37938585

RESUMEN

Understanding phenotype-to-genotype relationships is a grand challenge of 21st century biology with translational implications. The recently proposed "omnigenic" model postulates that effects of genetic variation on traits are mediated by core-genes and -proteins whose activities mechanistically influence the phenotype, whereas peripheral genes encode a regulatory network that indirectly affects phenotypes via core gene products. Here, we develop a positive-unlabeled graph representation-learning ensemble-approach based on a nested cross-validation to predict core-like genes for diverse diseases using Mendelian disorder genes for training. Employing mouse knockout phenotypes for external validations, we demonstrate that core-like genes display several key properties of core genes: Mouse knockouts of genes corresponding to our most confident predictions give rise to relevant mouse phenotypes at rates on par with the Mendelian disorder genes, and all candidates exhibit core gene properties like transcriptional deregulation in disease and loss-of-function intolerance. Moreover, as predicted for core genes, our candidates are enriched for drug targets and druggable proteins. In contrast to Mendelian disorder genes the new core-like genes are enriched for druggable yet untargeted gene products, which are therefore attractive targets for drug development. Interpretation of the underlying deep learning model suggests plausible explanations for our core gene predictions in form of molecular mechanisms and physical interactions. Our results demonstrate the potential of graph representation learning for the interpretation of biological complexity and pave the way for studying core gene properties and future drug development.

Asunto(s)

Traumatismos Craneocerebrales , Animales , Ratones , Sistemas de Liberación de Medicamentos , Desarrollo de Medicamentos , Fenotipo , ARN

Coalescent Processes with Skewed Offspring Distributions and Nonequilibrium Demography.

Matuszewski, Sebastian; Hildebrandt, Marcel E; Achaz, Guillaume; Jensen, Jeffrey D.

Genetics ; 208(1): 323-338, 2018 01.

Artículo en Inglés | MEDLINE | ID: mdl-29127263

RESUMEN

Nonequilibrium demography impacts coalescent genealogies leaving detectable, well-studied signatures of variation. However, similar genomic footprints are also expected under models of large reproductive skew, posing a serious problem when trying to make inference. Furthermore, current approaches consider only one of the two processes at a time, neglecting any genomic signal that could arise from their simultaneous effects, preventing the possibility of jointly inferring parameters relating to both offspring distribution and population history. Here, we develop an extended Moran model with exponential population growth, and demonstrate that the underlying ancestral process converges to a time-inhomogeneous psi-coalescent. However, by applying a nonlinear change of time scale-analogous to the Kingman coalescent-we find that the ancestral process can be rescaled to its time-homogeneous analog, allowing the process to be simulated quickly and efficiently. Furthermore, we derive analytical expressions for the expected site-frequency spectrum under the time-inhomogeneous psi-coalescent, and develop an approximate-likelihood framework for the joint estimation of the coalescent and growth parameters. By means of extensive simulation, we demonstrate that both can be estimated accurately from whole-genome data. In addition, not accounting for demography can lead to serious biases in the inferred coalescent model, with broad implications for genomic studies ranging from ecology to conservation biology. Finally, we use our method to analyze sequence data from Japanese sardine populations, and find evidence of high variation in individual reproductive success, but few signs of a recent demographic expansion.

Asunto(s)

Algoritmos , Modelos Teóricos

A Statistical Guide to the Design of Deep Mutational Scanning Experiments.

Matuszewski, Sebastian; Hildebrandt, Marcel E; Ghenu, Ana-Hermina; Jensen, Jeffrey D; Bank, Claudia.

Genetics ; 204(1): 77-87, 2016 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-27412710

RESUMEN

The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates.

Asunto(s)

Análisis Mutacional de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Estadísticos , Evolución Biológica , Biometría , Aptitud Genética , Genética de Población/métodos , Modelos Genéticos , Mutación , Selección Genética

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA