Pesquisa | Secretaria de Estado da Saúde

1.

BioDeepfuse: a hybrid deep learning approach with integrated feature extraction techniques for enhanced non-coding RNA classification.

Avila Santos, Anderson P; de Almeida, Breno L S; Bonidia, Robson P; Stadler, Peter F; Stefanic, Polonca; Mandic-Mulec, Ines; Rocha, Ulisses; Sanches, Danilo S; de Carvalho, André C P L F.

RNA Biol ; 21(1): 1-12, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38528797

RESUMO

The accurate classification of non-coding RNA (ncRNA) sequences is pivotal for advanced non-coding genome annotation and analysis, a fundamental aspect of genomics that facilitates understanding of ncRNA functions and regulatory mechanisms in various biological processes. While traditional machine learning approaches have been employed for distinguishing ncRNA, these often necessitate extensive feature engineering. Recently, deep learning algorithms have provided advancements in ncRNA classification. This study presents BioDeepFuse, a hybrid deep learning framework integrating convolutional neural networks (CNN) or bidirectional long short-term memory (BiLSTM) networks with handcrafted features for enhanced accuracy. This framework employs a combination of k-mer one-hot, k-mer dictionary, and feature extraction techniques for input representation. Extracted features, when embedded into the deep network, enable optimal utilization of spatial and sequential nuances of ncRNA sequences. Using benchmark datasets and real-world RNA samples from bacterial organisms, we evaluated the performance of BioDeepFuse. Results exhibited high accuracy in ncRNA classification, underscoring the robustness of our tool in addressing complex ncRNA sequence data challenges. The effective melding of CNN or BiLSTM with external features heralds promising directions for future research, particularly in refining ncRNA classifiers and deepening insights into ncRNAs in cellular processes and disease manifestations. In addition to its original application in the context of bacterial organisms, the methodologies and techniques integrated into our framework can potentially render BioDeepFuse effective in various and broader domains.

Assuntos

Aprendizado Profundo , RNA não Traduzido/genética , Algoritmos , RNA , Redes Neurais de Computação

2.

The Theory of Gene Family Histories.

Hellmuth, Marc; Stadler, Peter F.

Methods Mol Biol ; 2802: 1-32, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38819554

RESUMO

Most genes are part of larger families of evolutionary-related genes. The history of gene families typically involves duplications and losses of genes as well as horizontal transfers into other organisms. The reconstruction of detailed gene family histories, i.e., the precise dating of evolutionary events relative to phylogenetic tree of the underlying species has remained a challenging topic despite their importance as a basis for detailed investigations into adaptation and functional evolution of individual members of the gene family. The identification of orthologs, moreover, is a particularly important subproblem of the more general setting considered here. In the last few years, an extensive body of mathematical results has appeared that tightly links orthology, a formal notion of best matches among genes, and horizontal gene transfer. The purpose of this chapter is to broadly outline some of the key mathematical insights and to discuss their implication for practical applications. In particular, we focus on tree-free methods, i.e., methods to infer orthology or horizontal gene transfer as well as gene trees, species trees, and reconciliations between them without using a priori knowledge of the underlying trees or statistical models for the inference of phylogenetic trees. Instead, the initial step aims to extract binary relations among genes.

Assuntos

Evolução Molecular , Transferência Genética Horizontal , Família Multigênica , Filogenia , Modelos Genéticos , Biologia Computacional/métodos

3.

Cavity approach for the approximation of spectral density of graphs with heterogeneous structures.

Guzman, Grover E C; Stadler, Peter F; Fujita, Andre.

Phys Rev E ; 109(3-1): 034303, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38632720

RESUMO

Graphs have become widely used to represent and study social, biological, and technological systems. Statistical methods to analyze empirical graphs were proposed based on the graph's spectral density. However, their running time is cubic in the number of vertices, precluding direct application to large instances. Thus, efficient algorithms to calculate the spectral density become necessary. For sparse graphs, the cavity method can efficiently approximate the spectral density of locally treelike undirected and directed graphs. However, it does not apply to most empirical graphs because they have heterogeneous structures. Thus, we propose methods for undirected and directed graphs with heterogeneous structures using a new vertex's neighborhood definition and the cavity approach. Our methods' time and space complexities are O(|E|h_{max}^{3}t) and O(|E|h_{max}^{2}t), respectively, where |E| is the number of edges, h_{max} is the size of the largest local neighborhood of a vertex, and t is the number of iterations required for convergence. We demonstrate the practical efficacy by estimating the spectral density of simulated and real-world undirected and directed graphs.

4.

Assessing the Quality of Cotranscriptional Folding Simulations.

Kühnl, Felix; Stadler, Peter F; Findeiß, Sven.

Methods Mol Biol ; 2726: 347-376, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38780738

RESUMO

Structural changes in RNAs are an important contributor to controlling gene expression not only at the posttranscriptional stage but also during transcription. A subclass of riboswitches and RNA thermometers located in the 5' region of the primary transcript regulates the downstream functional unit - usually an ORF - through premature termination of transcription. Not only such elements occur naturally, but they are also attractive devices in synthetic biology. The possibility to design such riboswitches or RNA thermometers is thus of considerable practical interest. Since these functional RNA elements act already during transcription, it is important to model and understand the dynamics of folding and, in particular, the formation of intermediate structures concurrently with transcription. Cotranscriptional folding simulations are therefore an important step to verify the functionality of design constructs before conducting expensive and labor-intensive wet lab experiments. For RNAs, full-fledged molecular dynamics simulations are far beyond practical reach because of both the size of the molecules and the timescales of interest. Even at the simplified level of secondary structures, further approximations are necessary. The BarMap approach is based on representing the secondary structure landscape for each individual transcription step by a coarse-grained representation that only retains a small set of low-energy local minima and the energy barriers between them. The folding dynamics between two transcriptional elongation steps is modeled as a Markov process on this representation. Maps between pairs of consecutive coarse-grained landscapes make it possible to follow the folding process as it changes in response to transcription elongation. In its original implementation, the BarMap software provides a general framework to investigate RNA folding dynamics on temporally changing landscapes. It is, however, difficult to use in particular for specific scenarios such as cotranscriptional folding. To overcome this limitation, we developed the user-friendly BarMap-QA pipeline described in detail in this contribution. It is illustrated here by an elaborate example that emphasizes the careful monitoring of several quality measures. Using an iterative workflow, a reliable and complete kinetics simulation of a synthetic, transcription-regulating riboswitch is obtained using minimal computational resources. All programs and scripts used in this contribution are free software and available for download as a source distribution for Linux® or as a platform-independent Docker® image including support for Apple macOS® and Microsoft Windows®.

Assuntos

Simulação de Dinâmica Molecular , Conformação de Ácido Nucleico , Dobramento de RNA , Transcrição Gênica , Riboswitch/genética , RNA/química , RNA/genética , Software

5.

Evolution of neuropeptide Y/RFamide-like receptors in nematodes.

Reinhardt, Franziska; Kaiser, Anette; Prömel, Simone; Stadler, Peter F.

Heliyon ; 10(14): e34473, 2024 Jul 30.

Artigo em Inglês | MEDLINE | ID: mdl-39130429

RESUMO

The Neuropeptide Y/RFamide-like receptors belong to the Rhodopsin-like G protein-coupled receptors G protein-coupled receptors (GPCRs) and are involved in functions such as locomotion, feeding and reproduction. With 41 described receptors they form the best-studied group of neuropeptide GPCRs in Caenorhabditis elegans. In order to understand the expansion of the Neuropeptide Y/RFamide-like receptor family in nematodes, we started from the sequences of selected receptor paralogs in C. elegans as query and surveyed the corresponding orthologous sequences in another 159 representative nematode target genomes. To this end we employed a automated pipeline based on ExonMatchSolver, a tool that solves the paralog-to-contig assignment problem. Utilizing subclass-specific HMMs we were able to detect a total of 1557 Neuropeptide Y/RFamide-like receptor sequences (1100 NPRs, 375 FRPRs and 82 C09F12.3) in the 159 target nematode genomes investigated here. These sequences demonstrate a good conservation of the Neuropeptide Y/RFamide-like receptors across the Nematoda and highlight the diversification of the family in nematode evolution. No other genus shares all Neuropeptide Y/RFamide-like receptors with the genus Caenorhabditis. At the same time, we observe large numbers of clade specific duplications and losses of family members across the phylum Nematoda.

6.

Comparative RNA Genomics.

Backofen, Rolf; Gorodkin, Jan; Hofacker, Ivo L; Stadler, Peter F.

Methods Mol Biol ; 2802: 347-393, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38819565

RESUMO

Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.

Assuntos

Biologia Computacional , Genômica , Humanos , Biologia Computacional/métodos , Genômica/métodos , Conformação de Ácido Nucleico , RNA/genética , RNA não Traduzido/genética , Análise de Sequência de RNA/métodos

7.

Phylogenetic and Chemical Probing Information as Soft Constraints in RNA Secondary Structure Prediction.

von Löhneysen, Sarah; Spicher, Thomas; Varenyk, Yuliia; Yao, Hua-Ting; Lorenz, Ronny; Hofacker, Ivo; Stadler, Peter F.

J Comput Biol ; 31(6): 549-563, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38935442

RESUMO

Extrinsic, experimental information can be incorporated into thermodynamics-based RNA folding algorithms in the form of pseudo-energies. Evolutionary conservation of RNA secondary structure elements is detectable in alignments of phylogenetically related sequences and provides evidence for the presence of certain base pairs that can also be converted into pseudo-energy contributions. We show that the centroid base pairs computed from a consensus folding model such as RNAalifold result in a substantial improvement of the prediction accuracy for single sequences. Evidence for specific base pairs turns out to be more informative than a position-wise profile for the conservation of the pairing status. A comparison with chemical probing data, furthermore, strongly suggests that phylogenetic base pairing data are more informative than position-specific data on (un)pairedness as obtained from chemical probing experiments. In this context we demonstrate, in addition, that the conversion of signal from probing data into pseudo-energies is possible using thermodynamic structure predictions as a reference instead of known RNA structures.

Assuntos

Algoritmos , Conformação de Ácido Nucleico , Filogenia , RNA , Termodinâmica , RNA/química , RNA/genética , Pareamento de Bases , Dobramento de RNA , Sequência de Bases , Biologia Computacional/métodos

8.

Reaction rebalancing: a novel approach to curating reaction databases.

Phan, Tieu-Long; Weinbauer, Klaus; Gärtner, Thomas; Merkle, Daniel; Andersen, Jakob L; Fagerberg, Rolf; Stadler, Peter F.

J Cheminform ; 16(1): 82, 2024 Jul 19.

Artigo em Inglês | MEDLINE | ID: mdl-39030583

RESUMO

PURPOSE: Reaction databases are a key resource for a wide variety of applications in computational chemistry and biochemistry, including Computer-aided Synthesis Planning (CASP) and the large-scale analysis of metabolic networks. The full potential of these resources can only be realized if datasets are accurate and complete. Missing co-reactants and co-products, i.e., unbalanced reactions, however, are the rule rather than the exception. The curation and correction of such incomplete entries is thus an urgent need. METHODS: The SynRBL framework addresses this issue with a dual-strategy: a rule-based method for non-carbon compounds, using atomic symbols and counts for prediction, alongside a Maximum Common Subgraph (MCS)-based technique for carbon compounds, aimed at aligning reactants and products to infer missing entities. RESULTS: The rule-based method exceeded 99% accuracy, while MCS-based accuracy varied from 81.19 to 99.33%, depending on reaction properties. Furthermore, an applicability domain and a machine learning scoring function were devised to quantify prediction confidence. The overall efficacy of this framework was delineated through its success rate and accuracy metrics, which spanned from 89.83 to 99.75% and 90.85 to 99.05%, respectively. CONCLUSION: The SynRBL framework offers a novel solution for recalibrating chemical reactions, significantly enhancing reaction completeness. With rigorous validation, it achieved groundbreaking accuracy in reaction rebalancing. This sets the stage for future improvement in particular of atom-atom mapping techniques as well as of downstream tasks such as automated synthesis planning. SCIENTIFIC CONTRIBUTION: SynRBL features a novel computational approach to correcting unbalanced entries in chemical reaction databases. By combining heuristic rules for inferring non-carbon compounds and common subgraph searches to address carbon unbalance, SynRBL successfully addresses most instances of this problem, which affects the majority of data in most large-scale resources. Compared to alternative solutions, SynRBL achieves a dramatic increase in both success rate and accurary, and provides the first freely available open source solution for this problem.

9.

Comprehensive survey of conserved RNA secondary structures in full-genome alignment of Hepatitis C virus.

Triebel, Sandra; Lamkiewicz, Kevin; Ontiveros, Nancy; Sweeney, Blake; Stadler, Peter F; Petrov, Anton I; Niepmann, Michael; Marz, Manja.

Sci Rep ; 14(1): 15145, 2024 07 02.

Artigo em Inglês | MEDLINE | ID: mdl-38956134

RESUMO

Hepatitis C virus (HCV) is a plus-stranded RNA virus that often chronically infects liver hepatocytes and causes liver cirrhosis and cancer. These viruses replicate their genomes employing error-prone replicases. Thereby, they routinely generate a large 'cloud' of RNA genomes (quasispecies) which-by trial and error-comprehensively explore the sequence space available for functional RNA genomes that maintain the ability for efficient replication and immune escape. In this context, it is important to identify which RNA secondary structures in the sequence space of the HCV genome are conserved, likely due to functional requirements. Here, we provide the first genome-wide multiple sequence alignment (MSA) with the prediction of RNA secondary structures throughout all representative full-length HCV genomes. We selected 57 representative genomes by clustering all complete HCV genomes from the BV-BRC database based on k-mer distributions and dimension reduction and adding RefSeq sequences. We include annotations of previously recognized features for easy comparison to other studies. Our results indicate that mainly the core coding region, the C-terminal NS5A region, and the NS5B region contain secondary structure elements that are conserved beyond coding sequence requirements, indicating functionality on the RNA level. In contrast, the genome regions in between contain less highly conserved structures. The results provide a complete description of all conserved RNA secondary structures and make clear that functionally important RNA secondary structures are present in certain HCV genome regions but are largely absent from other regions. Full-genome alignments of all branches of Hepacivirus C are provided in the supplement.

Assuntos

Sequência Conservada , Genoma Viral , Hepacivirus , Conformação de Ácido Nucleico , RNA Viral , Hepacivirus/genética , RNA Viral/genética , RNA Viral/química , Humanos , Alinhamento de Sequência , Hepatite C/virologia , Hepatite C/genética

10.

Mechanistic insights into G-protein coupling with an agonist-bound G-protein-coupled receptor.

Batebi, Hossein; Pérez-Hernández, Guillermo; Rahman, Sabrina N; Lan, Baoliang; Kamprad, Antje; Shi, Mingyu; Speck, David; Tiemann, Johanna K S; Guixà-González, Ramon; Reinhardt, Franziska; Stadler, Peter F; Papasergi-Scott, Makaía M; Skiniotis, Georgios; Scheerer, Patrick; Kobilka, Brian K; Mathiesen, Jesper M; Liu, Xiangyu; Hildebrand, Peter W.

Nat Struct Mol Biol ; 2024 Jun 12.

Artigo em Inglês | MEDLINE | ID: mdl-38867113

RESUMO

G-protein-coupled receptors (GPCRs) activate heterotrimeric G proteins by promoting guanine nucleotide exchange. Here, we investigate the coupling of G proteins with GPCRs and describe the events that ultimately lead to the ejection of GDP from its binding pocket in the Gα subunit, the rate-limiting step during G-protein activation. Using molecular dynamics simulations, we investigate the temporal progression of structural rearrangements of GDP-bound Gs protein (Gs·GDP; hereafter GsGDP) upon coupling to the ß2-adrenergic receptor (ß2AR) in atomic detail. The binding of GsGDP to the ß2AR is followed by long-range allosteric effects that significantly reduce the energy needed for GDP release: the opening of α1-αF helices, the displacement of the αG helix and the opening of the α-helical domain. Signal propagation to the Gs occurs through an extended receptor interface, including a lysine-rich motif at the intracellular end of a kinked transmembrane helix 6, which was confirmed by site-directed mutagenesis and functional assays. From this ß2AR-GsGDP intermediate, Gs undergoes an in-plane rotation along the receptor axis to approach the ß2AR-Gsempty state. The simulations shed light on how the structural elements at the receptor-G-protein interface may interact to transmit the signal over 30 Å to the nucleotide-binding site. Our analysis extends the current limited view of nucleotide-free snapshots to include additional states and structural features responsible for signaling and G-protein coupling specificity.

11.

Taking identity-by-descent analysis into the wild: Estimating realized relatedness in free-ranging macaques.

Freudiger, Annika; Jovanovic, Vladimir M; Huang, Yilei; Snyder-Mackler, Noah; Conrad, Donald F; Miller, Brian; Montague, Michael J; Westphal, Hendrikje; Stadler, Peter F; Bley, Stefanie; Horvath, Julie E; Brent, Lauren J N; Platt, Michael L; Ruiz-Lambides, Angelina; Tung, Jenny; Nowick, Katja; Ringbauer, Harald; Widdig, Anja.

bioRxiv ; 2024 Jan 11.

Artigo em Inglês | MEDLINE | ID: mdl-38260273

RESUMO

Biological relatedness is a key consideration in studies of behavior, population structure, and trait evolution. Except for parent-offspring dyads, pedigrees capture relatedness imperfectly. The number and length of DNA segments that are identical-by-descent (IBD) yield the most precise estimates of relatedness. Here, we leverage novel methods for estimating locus-specific IBD from low coverage whole genome resequencing data to demonstrate the feasibility and value of resolving fine-scaled gradients of relatedness in free-living animals. Using primarily 4-6× coverage data from a rhesus macaque (Macaca mulatta) population with available long-term pedigree data, we show that we can call the number and length of IBD segments across the genome with high accuracy even at 0.5× coverage. The resulting estimates demonstrate substantial variation in genetic relatedness within kin classes, leading to overlapping distributions between kin classes. They identify cryptic genetic relatives that are not represented in the pedigree and reveal elevated recombination rates in females relative to males, which allows us to discriminate maternal and paternal kin using genotype data alone. Our findings represent a breakthrough in the ability to understand the predictors and consequences of genetic relatedness in natural populations, contributing to our understanding of a fundamental component of population structure in the wild.

12.

Proteinortho6: pseudo-reciprocal best alignment heuristic for graph-based detection of (co-)orthologs.

Klemm, Paul; Stadler, Peter F; Lechner, Marcus.

Front Bioinform ; 3: 1322477, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38152702

RESUMO

Proteinortho is a widely used tool to predict (co)-orthologous groups of genes for any set of species. It finds application in comparative and functional genomics, phylogenomics, and evolutionary reconstructions. With a rapidly increasing number of available genomes, the demand for large-scale predictions is also growing. In this contribution, we evaluate and implement major algorithmic improvements that significantly enhance the speed of the analysis without reducing precision. Graph-based detection of (co-)orthologs is typically based on a reciprocal best alignment heuristic that requires an all vs. all comparison of proteins from all species under study. The initial identification of similar proteins is accelerated by introducing an alternative search tool along with a revised search strategy-the pseudo-reciprocal best alignment heuristic-that reduces the number of required sequence comparisons by one-half. The clustering algorithm was reworked to efficiently decompose very large clusters and accelerate processing. Proteinortho6 reduces the overall processing time by an order of magnitude compared to its predecessor while maintaining its small memory footprint and good predictive quality.

13.

Limits of experimental evidence in RNA secondary structure prediction.

von Löhneysen, Sarah; Mörl, Mario; Stadler, Peter F.

Front Bioinform ; 4: 1346779, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38456157

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa