Pesquisa | Biblioteca Virtual em Saúde

Species-aware DNA language models capture regulatory elements and their evolution.

Karollus, Alexander; Hingerl, Johannes; Gankin, Dennis; Grosshauser, Martin; Klemon, Kristian; Gagneur, Julien.

Genome Biol ; 25(1): 83, 2024 04 02.

Artigo em Inglês | MEDLINE | ID: mdl-38566111

RESUMO

BACKGROUND: The rise of large-scale multi-species genome sequencing projects promises to shed new light on how genomes encode gene regulatory instructions. To this end, new algorithms are needed that can leverage conservation to capture regulatory elements while accounting for their evolution. RESULTS: Here, we introduce species-aware DNA language models, which we trained on more than 800 species spanning over 500 million years of evolution. Investigating their ability to predict masked nucleotides from context, we show that DNA language models distinguish transcription factor and RNA-binding protein motifs from background non-coding sequence. Owing to their flexibility, DNA language models capture conserved regulatory elements over much further evolutionary distances than sequence alignment would allow. Remarkably, DNA language models reconstruct motif instances bound in vivo better than unbound ones and account for the evolution of motif sequences and their positional constraints, showing that these models capture functional high-order sequence and evolutionary context. We further show that species-aware training yields improved sequence representations for endogenous and MPRA-based gene expression prediction, as well as motif discovery. CONCLUSIONS: Collectively, these results demonstrate that species-aware DNA language models are a powerful, flexible, and scalable tool to integrate information from large compendia of highly diverged genomes.

Assuntos

DNA , Sequências Reguladoras de Ácido Nucleico , Sítios de Ligação , Alinhamento de Sequência , Algoritmos , Sequência Conservada/genética , Evolução Molecular

Genetically encoded barcodes for correlative volume electron microscopy.

Sigmund, Felix; Berezin, Oleksandr; Beliakova, Sofia; Magerl, Bernhard; Drawitsch, Martin; Piovesan, Alberto; Gonçalves, Filipa; Bodea, Silviu-Vasile; Winkler, Stefanie; Bousraou, Zoe; Grosshauser, Martin; Samara, Eleni; Pujol-Martí, Jesús; Schädler, Sebastian; So, Chun; Irsen, Stephan; Walch, Axel; Kofler, Florian; Piraud, Marie; Kornfeld, Joergen; Briggman, Kevin; Westmeyer, Gil Gregor.

Nat Biotechnol ; 41(12): 1734-1745, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37069313

RESUMO

While genetically encoded reporters are common for fluorescence microscopy, equivalent multiplexable gene reporters for electron microscopy (EM) are still scarce. Here, by installing a variable number of fixation-stable metal-interacting moieties in the lumen of encapsulin nanocompartments of different sizes, we developed a suite of spherically symmetric and concentric barcodes (EMcapsulins) that are readable by standard EM techniques. Six classes of EMcapsulins could be automatically segmented and differentiated. The coding capacity was further increased by arranging several EMcapsulins into distinct patterns via a set of rigid spacers of variable length. Fluorescent EMcapsulins were expressed to monitor subcellular structures in light and EM. Neuronal expression in Drosophila and mouse brains enabled the automatic identification of genetically defined cells in EM. EMcapsulins are compatible with transmission EM, scanning EM and focused ion beam scanning EM. The expandable palette of genetically controlled EM-readable barcodes can augment anatomical EM images with multiplexed gene expression maps.

Assuntos

Drosophila , Microscopia Eletrônica de Volume , Animais , Camundongos , Microscopia Eletrônica de Varredura , Drosophila/genética , Neurônios , Microscopia de Fluorescência/métodos

Re-evaluating Deep Neural Networks for Phylogeny Estimation: The Issue of Taxon Sampling.

Zaharias, Paul; Grosshauser, Martin; Warnow, Tandy.

J Comput Biol ; 29(1): 74-89, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-34986031

RESUMO

Deep neural networks (DNNs) have been recently proposed for quartet tree phylogeny estimation. Here, we present a study evaluating recently trained DNNs in comparison to a collection of standard phylogeny estimation methods on a heterogeneous collection of datasets simulated under the same models that were used to train the DNNs, and also under similar conditions but with higher rates of evolution. Our study shows that using DNNs with quartet amalgamation is less accurate than several standard phylogeny estimation methods we explore (e.g., maximum likelihood and maximum parsimony). We further find that simple standard phylogeny estimation methods match or improve on DNNs for quartet accuracy, especially, but not exclusively, when used in a global manner (i.e., the tree on the full dataset is computed and then the induced quartet trees are extracted from the full tree). Thus, our study provides evidence that a major challenge impacting the utility of current DNNs for phylogeny estimation is their restriction to estimating quartet trees that must subsequently be combined into a tree on the full dataset. In contrast, global methods (i.e., those that estimate trees from the full set of sequences) are able to benefit from taxon sampling, and hence have higher accuracy on large datasets.

Assuntos

Aprendizado Profundo , Redes Neurais de Computação , Filogenia , Sequência de Aminoácidos , Classificação/métodos , Biologia Computacional , Simulação por Computador , Bases de Dados Genéticas/estatística & dados numéricos , Evolução Molecular

Contribution potential of glaciers to water availability in different climate regimes.

Kaser, Georg; Grosshauser, Martin; Marzeion, Ben.

Proc Natl Acad Sci U S A ; 107(47): 20223-7, 2010 Nov 23.

Artigo em Inglês | MEDLINE | ID: mdl-21059938

RESUMO

Although reliable figures are often missing, considerable detrimental changes due to shrinking glaciers are universally expected for water availability in river systems under the influence of ongoing global climate change. We estimate the contribution potential of seasonally delayed glacier melt water to total water availability in large river systems. We find that the seasonally delayed glacier contribution is largest where rivers enter seasonally arid regions and negligible in the lowlands of river basins governed by monsoon climates. By comparing monthly glacier melt contributions with population densities in different altitude bands within each river basin, we demonstrate that strong human dependence on glacier melt is not collocated with highest population densities in most basins.

Assuntos

Mudança Climática , Camada de Gelo , Modelos Teóricos , Rios , Abastecimento de Água , Estações do Ano

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA