Búsqueda | BVS Nicaragua

1.

Comprehensive survey of conserved RNA secondary structures in full-genome alignment of Hepatitis C virus.

Triebel, Sandra; Lamkiewicz, Kevin; Ontiveros, Nancy; Sweeney, Blake; Stadler, Peter F; Petrov, Anton I; Niepmann, Michael; Marz, Manja.

Sci Rep ; 14(1): 15145, 2024 07 02.

Artículo en Inglés | MEDLINE | ID: mdl-38956134

RESUMEN

Hepatitis C virus (HCV) is a plus-stranded RNA virus that often chronically infects liver hepatocytes and causes liver cirrhosis and cancer. These viruses replicate their genomes employing error-prone replicases. Thereby, they routinely generate a large 'cloud' of RNA genomes (quasispecies) which-by trial and error-comprehensively explore the sequence space available for functional RNA genomes that maintain the ability for efficient replication and immune escape. In this context, it is important to identify which RNA secondary structures in the sequence space of the HCV genome are conserved, likely due to functional requirements. Here, we provide the first genome-wide multiple sequence alignment (MSA) with the prediction of RNA secondary structures throughout all representative full-length HCV genomes. We selected 57 representative genomes by clustering all complete HCV genomes from the BV-BRC database based on k-mer distributions and dimension reduction and adding RefSeq sequences. We include annotations of previously recognized features for easy comparison to other studies. Our results indicate that mainly the core coding region, the C-terminal NS5A region, and the NS5B region contain secondary structure elements that are conserved beyond coding sequence requirements, indicating functionality on the RNA level. In contrast, the genome regions in between contain less highly conserved structures. The results provide a complete description of all conserved RNA secondary structures and make clear that functionally important RNA secondary structures are present in certain HCV genome regions but are largely absent from other regions. Full-genome alignments of all branches of Hepacivirus C are provided in the supplement.

Asunto(s)

Secuencia Conservada , Genoma Viral , Hepacivirus , Conformación de Ácido Nucleico , ARN Viral , Hepacivirus/genética , ARN Viral/genética , ARN Viral/química , Humanos , Alineación de Secuencia , Hepatitis C/virología , Hepatitis C/genética

2.

Phylogenetic and Chemical Probing Information as Soft Constraints in RNA Secondary Structure Prediction.

von Löhneysen, Sarah; Spicher, Thomas; Varenyk, Yuliia; Yao, Hua-Ting; Lorenz, Ronny; Hofacker, Ivo; Stadler, Peter F.

J Comput Biol ; 31(6): 549-563, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38935442

RESUMEN

Extrinsic, experimental information can be incorporated into thermodynamics-based RNA folding algorithms in the form of pseudo-energies. Evolutionary conservation of RNA secondary structure elements is detectable in alignments of phylogenetically related sequences and provides evidence for the presence of certain base pairs that can also be converted into pseudo-energy contributions. We show that the centroid base pairs computed from a consensus folding model such as RNAalifold result in a substantial improvement of the prediction accuracy for single sequences. Evidence for specific base pairs turns out to be more informative than a position-wise profile for the conservation of the pairing status. A comparison with chemical probing data, furthermore, strongly suggests that phylogenetic base pairing data are more informative than position-specific data on (un)pairedness as obtained from chemical probing experiments. In this context we demonstrate, in addition, that the conversion of signal from probing data into pseudo-energies is possible using thermodynamic structure predictions as a reference instead of known RNA structures.

Asunto(s)

Algoritmos , Conformación de Ácido Nucleico , Filogenia , ARN , Termodinámica , ARN/química , ARN/genética , Emparejamiento Base , Pliegue del ARN , Secuencia de Bases , Biología Computacional/métodos

3.

Mechanistic insights into G-protein coupling with an agonist-bound G-protein-coupled receptor.

Batebi, Hossein; Pérez-Hernández, Guillermo; Rahman, Sabrina N; Lan, Baoliang; Kamprad, Antje; Shi, Mingyu; Speck, David; Tiemann, Johanna K S; Guixà-González, Ramon; Reinhardt, Franziska; Stadler, Peter F; Papasergi-Scott, Makaía M; Skiniotis, Georgios; Scheerer, Patrick; Kobilka, Brian K; Mathiesen, Jesper M; Liu, Xiangyu; Hildebrand, Peter W.

Nat Struct Mol Biol ; 2024 Jun 12.

Artículo en Inglés | MEDLINE | ID: mdl-38867113

RESUMEN

G-protein-coupled receptors (GPCRs) activate heterotrimeric G proteins by promoting guanine nucleotide exchange. Here, we investigate the coupling of G proteins with GPCRs and describe the events that ultimately lead to the ejection of GDP from its binding pocket in the Gα subunit, the rate-limiting step during G-protein activation. Using molecular dynamics simulations, we investigate the temporal progression of structural rearrangements of GDP-bound Gs protein (Gs·GDP; hereafter GsGDP) upon coupling to the ß2-adrenergic receptor (ß2AR) in atomic detail. The binding of GsGDP to the ß2AR is followed by long-range allosteric effects that significantly reduce the energy needed for GDP release: the opening of α1-αF helices, the displacement of the αG helix and the opening of the α-helical domain. Signal propagation to the Gs occurs through an extended receptor interface, including a lysine-rich motif at the intracellular end of a kinked transmembrane helix 6, which was confirmed by site-directed mutagenesis and functional assays. From this ß2AR-GsGDP intermediate, Gs undergoes an in-plane rotation along the receptor axis to approach the ß2AR-Gsempty state. The simulations shed light on how the structural elements at the receptor-G-protein interface may interact to transmit the signal over 30 Å to the nucleotide-binding site. Our analysis extends the current limited view of nucleotide-free snapshots to include additional states and structural features responsible for signaling and G-protein coupling specificity.

4.

Assessing the Quality of Cotranscriptional Folding Simulations.

Kühnl, Felix; Stadler, Peter F; Findeiß, Sven.

Methods Mol Biol ; 2726: 347-376, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38780738

RESUMEN

Structural changes in RNAs are an important contributor to controlling gene expression not only at the posttranscriptional stage but also during transcription. A subclass of riboswitches and RNA thermometers located in the 5' region of the primary transcript regulates the downstream functional unit - usually an ORF - through premature termination of transcription. Not only such elements occur naturally, but they are also attractive devices in synthetic biology. The possibility to design such riboswitches or RNA thermometers is thus of considerable practical interest. Since these functional RNA elements act already during transcription, it is important to model and understand the dynamics of folding and, in particular, the formation of intermediate structures concurrently with transcription. Cotranscriptional folding simulations are therefore an important step to verify the functionality of design constructs before conducting expensive and labor-intensive wet lab experiments. For RNAs, full-fledged molecular dynamics simulations are far beyond practical reach because of both the size of the molecules and the timescales of interest. Even at the simplified level of secondary structures, further approximations are necessary. The BarMap approach is based on representing the secondary structure landscape for each individual transcription step by a coarse-grained representation that only retains a small set of low-energy local minima and the energy barriers between them. The folding dynamics between two transcriptional elongation steps is modeled as a Markov process on this representation. Maps between pairs of consecutive coarse-grained landscapes make it possible to follow the folding process as it changes in response to transcription elongation. In its original implementation, the BarMap software provides a general framework to investigate RNA folding dynamics on temporally changing landscapes. It is, however, difficult to use in particular for specific scenarios such as cotranscriptional folding. To overcome this limitation, we developed the user-friendly BarMap-QA pipeline described in detail in this contribution. It is illustrated here by an elaborate example that emphasizes the careful monitoring of several quality measures. Using an iterative workflow, a reliable and complete kinetics simulation of a synthetic, transcription-regulating riboswitch is obtained using minimal computational resources. All programs and scripts used in this contribution are free software and available for download as a source distribution for Linux® or as a platform-independent Docker® image including support for Apple macOS® and Microsoft Windows®.

Asunto(s)

Simulación de Dinámica Molecular , Conformación de Ácido Nucleico , Pliegue del ARN , Transcripción Genética , Riboswitch/genética , ARN/química , ARN/genética , Programas Informáticos

5.

The Theory of Gene Family Histories.

Hellmuth, Marc; Stadler, Peter F.

Methods Mol Biol ; 2802: 1-32, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38819554

RESUMEN

Most genes are part of larger families of evolutionary-related genes. The history of gene families typically involves duplications and losses of genes as well as horizontal transfers into other organisms. The reconstruction of detailed gene family histories, i.e., the precise dating of evolutionary events relative to phylogenetic tree of the underlying species has remained a challenging topic despite their importance as a basis for detailed investigations into adaptation and functional evolution of individual members of the gene family. The identification of orthologs, moreover, is a particularly important subproblem of the more general setting considered here. In the last few years, an extensive body of mathematical results has appeared that tightly links orthology, a formal notion of best matches among genes, and horizontal gene transfer. The purpose of this chapter is to broadly outline some of the key mathematical insights and to discuss their implication for practical applications. In particular, we focus on tree-free methods, i.e., methods to infer orthology or horizontal gene transfer as well as gene trees, species trees, and reconciliations between them without using a priori knowledge of the underlying trees or statistical models for the inference of phylogenetic trees. Instead, the initial step aims to extract binary relations among genes.

Asunto(s)

Evolución Molecular , Transferencia de Gen Horizontal , Familia de Multigenes , Filogenia , Modelos Genéticos , Biología Computacional/métodos

6.

Comparative RNA Genomics.

Backofen, Rolf; Gorodkin, Jan; Hofacker, Ivo L; Stadler, Peter F.

Methods Mol Biol ; 2802: 347-393, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38819565

RESUMEN

Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.

Asunto(s)

Biología Computacional , Genómica , Humanos , Biología Computacional/métodos , Genómica/métodos , Conformación de Ácido Nucleico , ARN/genética , ARN no Traducido/genética , Análisis de Secuencia de ARN/métodos

7.

Cavity approach for the approximation of spectral density of graphs with heterogeneous structures.

Guzman, Grover E C; Stadler, Peter F; Fujita, Andre.

Phys Rev E ; 109(3-1): 034303, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38632720

RESUMEN

Graphs have become widely used to represent and study social, biological, and technological systems. Statistical methods to analyze empirical graphs were proposed based on the graph's spectral density. However, their running time is cubic in the number of vertices, precluding direct application to large instances. Thus, efficient algorithms to calculate the spectral density become necessary. For sparse graphs, the cavity method can efficiently approximate the spectral density of locally treelike undirected and directed graphs. However, it does not apply to most empirical graphs because they have heterogeneous structures. Thus, we propose methods for undirected and directed graphs with heterogeneous structures using a new vertex's neighborhood definition and the cavity approach. Our methods' time and space complexities are O(|E|h_{max}^{3}t) and O(|E|h_{max}^{2}t), respectively, where |E| is the number of edges, h_{max} is the size of the largest local neighborhood of a vertex, and t is the number of iterations required for convergence. We demonstrate the practical efficacy by estimating the spectral density of simulated and real-world undirected and directed graphs.

8.

Limits of experimental evidence in RNA secondary structure prediction.

von Löhneysen, Sarah; Mörl, Mario; Stadler, Peter F.

Front Bioinform ; 4: 1346779, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38456157

9.

BioDeepfuse: a hybrid deep learning approach with integrated feature extraction techniques for enhanced non-coding RNA classification.

Avila Santos, Anderson P; de Almeida, Breno L S; Bonidia, Robson P; Stadler, Peter F; Stefanic, Polonca; Mandic-Mulec, Ines; Rocha, Ulisses; Sanches, Danilo S; de Carvalho, André C P L F.

RNA Biol ; 21(1): 1-12, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38528797

RESUMEN

The accurate classification of non-coding RNA (ncRNA) sequences is pivotal for advanced non-coding genome annotation and analysis, a fundamental aspect of genomics that facilitates understanding of ncRNA functions and regulatory mechanisms in various biological processes. While traditional machine learning approaches have been employed for distinguishing ncRNA, these often necessitate extensive feature engineering. Recently, deep learning algorithms have provided advancements in ncRNA classification. This study presents BioDeepFuse, a hybrid deep learning framework integrating convolutional neural networks (CNN) or bidirectional long short-term memory (BiLSTM) networks with handcrafted features for enhanced accuracy. This framework employs a combination of k-mer one-hot, k-mer dictionary, and feature extraction techniques for input representation. Extracted features, when embedded into the deep network, enable optimal utilization of spatial and sequential nuances of ncRNA sequences. Using benchmark datasets and real-world RNA samples from bacterial organisms, we evaluated the performance of BioDeepFuse. Results exhibited high accuracy in ncRNA classification, underscoring the robustness of our tool in addressing complex ncRNA sequence data challenges. The effective melding of CNN or BiLSTM with external features heralds promising directions for future research, particularly in refining ncRNA classifiers and deepening insights into ncRNAs in cellular processes and disease manifestations. In addition to its original application in the context of bacterial organisms, the methodologies and techniques integrated into our framework can potentially render BioDeepFuse effective in various and broader domains.

Asunto(s)

Aprendizaje Profundo , ARN no Traducido/genética , Algoritmos , ARN , Redes Neurales de la Computación

10.

Taking identity-by-descent analysis into the wild: Estimating realized relatedness in free-ranging macaques.

Freudiger, Annika; Jovanovic, Vladimir M; Huang, Yilei; Snyder-Mackler, Noah; Conrad, Donald F; Miller, Brian; Montague, Michael J; Westphal, Hendrikje; Stadler, Peter F; Bley, Stefanie; Horvath, Julie E; Brent, Lauren J N; Platt, Michael L; Ruiz-Lambides, Angelina; Tung, Jenny; Nowick, Katja; Ringbauer, Harald; Widdig, Anja.

bioRxiv ; 2024 Jan 11.

Artículo en Inglés | MEDLINE | ID: mdl-38260273

RESUMEN

Biological relatedness is a key consideration in studies of behavior, population structure, and trait evolution. Except for parent-offspring dyads, pedigrees capture relatedness imperfectly. The number and length of DNA segments that are identical-by-descent (IBD) yield the most precise estimates of relatedness. Here, we leverage novel methods for estimating locus-specific IBD from low coverage whole genome resequencing data to demonstrate the feasibility and value of resolving fine-scaled gradients of relatedness in free-living animals. Using primarily 4-6× coverage data from a rhesus macaque (Macaca mulatta) population with available long-term pedigree data, we show that we can call the number and length of IBD segments across the genome with high accuracy even at 0.5× coverage. The resulting estimates demonstrate substantial variation in genetic relatedness within kin classes, leading to overlapping distributions between kin classes. They identify cryptic genetic relatives that are not represented in the pedigree and reveal elevated recombination rates in females relative to males, which allows us to discriminate maternal and paternal kin using genotype data alone. Our findings represent a breakthrough in the ability to understand the predictors and consequences of genetic relatedness in natural populations, contributing to our understanding of a fundamental component of population structure in the wild.

11.

MuDoGeR: Multi-Domain Genome recovery from metagenomes made easy.

Rocha, Ulisses; Coelho Kasmanas, Jonas; Kallies, René; Saraiva, Joao Pedro; Toscan, Rodolfo Brizola; Stefanic, Polonca; Bicalho, Marcos Fleming; Borim Correa, Felipe; Bastürk, Merve Nida; Fousekis, Efthymios; Viana Barbosa, Luiz Miguel; Plewka, Julia; Probst, Alexander J; Baldrian, Petr; Stadler, Peter F.

Mol Ecol Resour ; 24(2): e13904, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-37994269

RESUMEN

Several computational frameworks and workflows that recover genomes from prokaryotes, eukaryotes and viruses from metagenomes exist. Yet, it is difficult for scientists with little bioinformatics experience to evaluate quality, annotate genes, dereplicate, assign taxonomy and calculate relative abundance and coverage of genomes belonging to different domains. MuDoGeR is a user-friendly tool tailored for those familiar with Unix command-line environment that makes it easy to recover genomes of prokaryotes, eukaryotes and viruses from metagenomes, either alone or in combination. We tested MuDoGeR using 24 individual-isolated genomes and 574 metagenomes, demonstrating the applicability for a few samples and high throughput. While MuDoGeR can recover eukaryotic viral sequences, its characterization is predominantly skewed towards bacterial and archaeal viruses, reflecting the field's current state. However, acting as a dynamic wrapper, the MuDoGeR is designed to constantly incorporate updates and integrate new tools, ensuring its ongoing relevance in the rapidly evolving field. MuDoGeR is open-source software available at https://github.com/mdsufz/MuDoGeR. Additionally, MuDoGeR is also available as a Singularity container.

Asunto(s)

Metagenoma , Virus , Metagenómica , Programas Informáticos , Bacterias/genética , Filogenia , Virus/genética

12.

Proteinortho6: pseudo-reciprocal best alignment heuristic for graph-based detection of (co-)orthologs.

Klemm, Paul; Stadler, Peter F; Lechner, Marcus.

Front Bioinform ; 3: 1322477, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-38152702

RESUMEN

Proteinortho is a widely used tool to predict (co)-orthologous groups of genes for any set of species. It finds application in comparative and functional genomics, phylogenomics, and evolutionary reconstructions. With a rapidly increasing number of available genomes, the demand for large-scale predictions is also growing. In this contribution, we evaluate and implement major algorithmic improvements that significantly enhance the speed of the analysis without reducing precision. Graph-based detection of (co-)orthologs is typically based on a reciprocal best alignment heuristic that requires an all vs. all comparison of proteins from all species under study. The initial identification of similar proteins is accelerated by introducing an alternative search tool along with a revised search strategy-the pseudo-reciprocal best alignment heuristic-that reduces the number of required sequence comparisons by one-half. The clustering algorithm was reworked to efficiently decompose very large clusters and accelerate processing. Proteinortho6 reduces the overall processing time by an order of magnitude compared to its predecessor while maintaining its small memory footprint and good predictive quality.

13.

RNA interaction format: a general data format for RNA interactions.

Schäfer, Richard A; Rabsch, Dominik; Scholz, Guillaume E; Stadler, Peter F; Hess, Wolfgang R; Backofen, Rolf; Fallmann, Jörg; Voß, Björn.

Bioinformatics ; 39(11)2023 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-37944046

RESUMEN

SUMMARY: RNA molecules play crucial roles in various biological processes. They mediate their function mainly by interacting with other RNAs or proteins. At present, information about these interactions is distributed over different resources, often providing the data in simple tab-delimited formats that differ between the databases. There is no standardized data format that can capture the nature of all these different interactions in detail. AVAILABILITY AND IMPLEMENTATION: Here, we propose the RNA interaction format (RIF) for the detailed representation of RNA-RNA and RNA-Protein interactions and provide reference implementations in C/C++, Python, and JavaScript. RIF is released under licence GNU General Public License version 3 (GNU GPLv3) and is available on https://github.com/RNABioInfo/rna-interaction-format.

Asunto(s)

ARN , Programas Informáticos , Bases de Datos Factuales , Proteínas

14.

Relative timing information and orthology in evolutionary scenarios.

Schaller, David; Hartmann, Tom; Lafond, Manuel; Stadler, Peter F; Wieseke, Nicolas; Hellmuth, Marc.

Algorithms Mol Biol ; 18(1): 16, 2023 Nov 08.

Artículo en Inglés | MEDLINE | ID: mdl-37940998

RESUMEN

BACKGROUND: Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative timing of the last common ancestors of two extant genes (leaves of T) and the last common ancestors of the two species (leaves of S) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event. The relative timing information of gene and species divergences is captured by three colored graphs that have the extant genes as vertices and the species in which the genes are found as vertex colors: the equal-divergence-time (EDT) graph, the later-divergence-time (LDT) graph and the prior-divergence-time (PDT) graph, which together form an edge partition of the complete graph. RESULTS: Here we give a complete characterization in terms of informative and forbidden triples that can be read off the three graphs and provide a polynomial time algorithm for constructing an evolutionary scenario that explains the graphs, provided such a scenario exists. While both LDT and PDT graphs are cographs, this is not true for the EDT graph in general. We show that every EDT graph is perfect. While the information about LDT and PDT graphs is necessary to recognize EDT graphs in polynomial-time for general scenarios, this extra information can be dropped in the HGT-free case. However, recognition of EDT graphs without knowledge of putative LDT and PDT graphs is NP-complete for general scenarios. In contrast, PDT graphs can be recognized in polynomial-time. We finally connect the EDT graph to the alternative definitions of orthology that have been proposed for scenarios with horizontal gene transfer. With one exception, the corresponding graphs are shown to be colored cographs.

15.

The AnimalAssociatedMetagenomeDB reveals a bias towards livestock and developed countries and blind spots in functional-potential studies of animal-associated microbiomes.

Avila Santos, Anderson Paulo; Kabiru Nata'ala, Muhammad; Kasmanas, Jonas Coelho; Bartholomäus, Alexander; Keller-Costa, Tina; Jurburg, Stephanie D; Tal, Tamara; Camarinha-Silva, Amélia; Saraiva, João Pedro; Ponce de Leon Ferreira de Carvalho, André Carlos; Stadler, Peter F; Sipoli Sanches, Danilo; Rocha, Ulisses.

Anim Microbiome ; 5(1): 48, 2023 Oct 05.

Artículo en Inglés | MEDLINE | ID: mdl-37798675

RESUMEN

BACKGROUND: Metagenomic data can shed light on animal-microbiome relationships and the functional potential of these communities. Over the past years, the generation of metagenomics data has increased exponentially, and so has the availability and reusability of data present in public repositories. However, identifying which datasets and associated metadata are available is not straightforward. We created the Animal-Associated Metagenome Metadata Database (AnimalAssociatedMetagenomeDB - AAMDB) to facilitate the identification and reuse of publicly available non-human, animal-associated metagenomic data, and metadata. Further, we used the AAMDB to (i) annotate common and scientific names of the species; (ii) determine the fraction of vertebrates and invertebrates; (iii) study their biogeography; and (iv) specify whether the animals were wild, pets, livestock or used for medical research. RESULTS: We manually selected metagenomes associated with non-human animals from SRA and MG-RAST. Next, we standardized and curated 51 metadata attributes (e.g., host, compartment, geographic coordinates, and country). The AAMDB version 1.0 contains 10,885 metagenomes associated with 165 different species from 65 different countries. From the collected metagenomes, 51.1% were recovered from animals associated with medical research or grown for human consumption (i.e., mice, rats, cattle, pigs, and poultry). Further, we observed an over-representation of animals collected in temperate regions (89.2%) and a lower representation of samples from the polar zones, with only 11 samples in total. The most common genus among invertebrate animals was Trichocerca (rotifers). CONCLUSION: Our work may guide host species selection in novel animal-associated metagenome research, especially in biodiversity and conservation studies. The data available in our database will allow scientists to perform meta-analyses and test new hypotheses (e.g., host-specificity, strain heterogeneity, and biogeography of animal-associated metagenomes), leveraging existing data. The AAMDB WebApp is a user-friendly interface that is publicly available at https://webapp.ufz.de/aamdb/ .

16.

Tailored machine learning models for functional RNA detection in genome-wide screens.

Klapproth, Christopher; Zötzsche, Siegfried; Kühnl, Felix; Fallmann, Jörg; Stadler, Peter F; Findeiß, Sven.

NAR Genom Bioinform ; 5(3): lqad072, 2023 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-37608800

RESUMEN

The in silico prediction of non-coding and protein-coding genetic loci has received considerable attention in comparative genomics aiming in particular at the identification of properties of nucleotide sequences that are informative of their biological role in the cell. We present here a software framework for the alignment-based training, evaluation and application of machine learning models with user-defined parameters. Instead of focusing on the one-size-fits-all approach of pervasive in silico annotation pipelines, we offer a framework for the structured generation and evaluation of models based on arbitrary features and input data, focusing on stable and explainable results. Furthermore, we showcase the usage of our software package in a full-genome screen of Drosophila melanogaster and evaluate our results against the well-known but much less flexible program RNAz.

17.

RNAcode_Web - Convenient identification of evolutionary conserved protein coding regions.

Anders, John; Stadler, Peter F.

J Integr Bioinform ; 20(3)2023 Sep 01.

Artículo en Inglés | MEDLINE | ID: mdl-37615674

RESUMEN

The differentiation of regions with coding potential from non-coding regions remains a key task in computational biology. Methods such as RNAcode that exploit patterns of sequence conservation for this task have a substantial advantage in classification accuracy in particular for short coding sequences, compared to methods that rely on a single input sequence. However, they require sequence alignments as input. Frequently, suitable multiple sequence alignments are not readily available and are tedious, and sometimes difficult to construct. We therefore introduce here a new web service that provides access to the well-known coding sequence detector RNAcode with minimal user overhead. It requires as input only a single target nucleotide sequence. The service automates the collection, selection, and preparation of homologous sequences from the NCBI database, as well as the construction of the multiple sequence alignment that are needed as input for RNAcode. The service automatizes the entire pre- and postprocessing and thus makes the investigation of specific genomic regions for previously unannotated coding regions, such as small peptides or additional introns, a simple task that is easily accessible to non-expert users. RNAcode_Web is accessible online at rnacode.bioinf.uni-leipzig.de.

Asunto(s)

Genómica , Programas Informáticos , Sistemas de Lectura Abierta , Alineación de Secuencia , Biología Computacional/métodos

18.

Clustering systems of phylogenetic networks.

Hellmuth, Marc; Schaller, David; Stadler, Peter F.

Theory Biosci ; 142(4): 301-358, 2023 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-37573261

RESUMEN

Rooted acyclic graphs appear naturally when the phylogenetic relationship of a set X of taxa involves not only speciations but also recombination, horizontal transfer, or hybridization that cannot be captured by trees. A variety of classes of such networks have been discussed in the literature, including phylogenetic, level-1, tree-child, tree-based, galled tree, regular, or normal networks as models of different types of evolutionary processes. Clusters arise in models of phylogeny as the sets [Formula: see text] of descendant taxa of a vertex v. The clustering system [Formula: see text] comprising the clusters of a network N conveys key information on N itself. In the special case of rooted phylogenetic trees, T is uniquely determined by its clustering system [Formula: see text]. Although this is no longer true for networks in general, it is of interest to relate properties of N and [Formula: see text]. Here, we systematically investigate the relationships of several well-studied classes of networks and their clustering systems. The main results are correspondences of classes of networks and clustering systems of the following form: If N is a network of type [Formula: see text], then [Formula: see text] satisfies [Formula: see text], and conversely if [Formula: see text] is a clustering system satisfying [Formula: see text] then there is network N of type [Formula: see text] such that [Formula: see text].This, in turn, allows us to investigate the mutual dependencies between the distinct types of networks in much detail.

19.

StandEnA: a customizable workflow for standardized annotation and generating a presence-absence matrix of proteins.

Chafra, Fatma; Borim Correa, Felipe; Oni, Faith; Konu Karakayali, Özlen; Stadler, Peter F; Nunes da Rocha, Ulisses.

Bioinform Adv ; 3(1): vbad069, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37448812

RESUMEN

Motivation: Several genome annotation tools standardize annotation outputs for comparability. During standardization, these tools do not allow user-friendly customization of annotation databases; limiting their flexibility and applicability in downstream analysis. Results: StandEnA is a user-friendly command-line tool for Linux that facilitates the generation of custom databases by retrieving protein sequences from multiple databases. Directed by a user-defined list of standard names, StandEnA retrieves synonyms to search for corresponding sequences in a set of public databases. Custom databases are used in prokaryotic genome annotation to generate standardized presence-absence matrices and reference files containing standard database identifiers. To showcase StandEnA, we applied it to six metagenome-assembled genomes to analyze three different pathways. Availability and implementation: StandEnA is an open-source software available at https://github.com/mdsufz/StandEnA. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

20.

Local RNA folding revisited.

Waldl, Maria; Spicher, Thomas; Lorenz, Ronny; Beckmann, Irene K; Hofacker, Ivo L; Löhneysen, Sarah Von; Stadler, Peter F.

J Bioinform Comput Biol ; 21(4): 2350016, 2023 08.

Artículo en Inglés | MEDLINE | ID: mdl-37522173

RESUMEN

Most of the functional RNA elements located within large transcripts are local. Local folding therefore serves a practically useful approximation to global structure prediction. Due to the sensitivity of RNA secondary structure prediction to the exact definition of sequence ends, accuracy can be increased by averaging local structure predictions over multiple, overlapping sequence windows. These averages can be computed efficiently by dynamic programming. Here we revisit the local folding problem, present a concise mathematical formalization that generalizes previous approaches and show that correct Boltzmann samples can be obtained by local stochastic backtracing in McCaskill's algorithms but not from local folding recursions. Corresponding new features are implemented in the ViennaRNA package to improve the support of local folding. Applications include the computation of maximum expected accuracy structures from RNAplfold data and a mutual information measure to quantify the sensitivity of individual sequence positions.

Asunto(s)

Pliegue del ARN , ARN , Conformación de Ácido Nucleico , ARN/química , Algoritmos , ARN no Traducido

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA