Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Proc Natl Acad Sci U S A ; 120(41): e2307289120, 2023 10 10.
Artículo en Inglés | MEDLINE | ID: mdl-37788315

RESUMEN

The importance of whole-genome duplication (WGD) for evolution is controversial. Whereas some view WGD mainly as detrimental and an evolutionary dead end, there is growing evidence that polyploidization can help overcome environmental change, stressful conditions, or periods of extinction. However, despite much research, the mechanistic underpinnings of why and how polyploids might be able to outcompete or outlive nonpolyploids at times of environmental upheaval remain elusive, especially for autopolyploids, in which heterosis effects are limited. On the longer term, WGD might increase both mutational and environmental robustness due to redundancy and increased genetic variation, but on the short-or even immediate-term, selective advantages of WGDs are harder to explain. Here, by duplicating artificially generated Gene Regulatory Networks (GRNs), we show that duplicated GRNs-and thus duplicated genomes-show higher signal output variation than nonduplicated GRNs. This increased variation leads to niche expansion and can provide polyploid populations with substantial advantages to survive environmental turmoil. In contrast, under stable environments, GRNs might be maladaptive to changes, a phenomenon that is exacerbated in duplicated GRNs. We believe that these results provide insights into how genome duplication and (auto)polyploidy might help organisms to adapt quickly to novel conditions and to survive ecological uproar or even cataclysmic events.


Asunto(s)
Duplicación de Gen , Redes Reguladoras de Genes , Humanos , Genoma , Poliploidía , Evolución Molecular , Genoma de Planta/genética
2.
BMC Bioinformatics ; 21(1): 402, 2020 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-32928110

RESUMEN

BACKGROUND: De Bruijn graphs are key data structures for the analysis of next-generation sequencing data. They efficiently represent the overlap between reads and hence, also the underlying genome sequence. However, sequencing errors and repeated subsequences render the identification of the true underlying sequence difficult. A key step in this process is the inference of the multiplicities of nodes and arcs in the graph. These multiplicities correspond to the number of times each k-mer (resp. k+1-mer) implied by a node (resp. arc) is present in the genomic sequence. Determining multiplicities thus reveals the repeat structure and presence of sequencing errors. Multiplicities of nodes/arcs in the de Bruijn graph are reflected in their coverage, however, coverage variability and coverage biases render their determination ambiguous. Current methods to determine node/arc multiplicities base their decisions solely on the information in nodes and arcs individually, under-utilising the information present in the sequencing data. RESULTS: To improve the accuracy with which node and arc multiplicities in a de Bruijn graph are inferred, we developed a conditional random field (CRF) model to efficiently combine the coverage information within each node/arc individually with the information of surrounding nodes and arcs. Multiplicities are thus collectively assigned in a more consistent manner. CONCLUSIONS: We demonstrate that the CRF model yields significant improvements in accuracy and a more robust expectation-maximisation parameter estimation. True k-mers can be distinguished from erroneous k-mers with a higher F1 score than existing methods. A C++11 implementation is available at https://github.com/biointec/detox under the GNU AGPL v3.0 license.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Algoritmos , Humanos
3.
BMC Bioinformatics ; 20(1): 27, 2019 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-30646859

RESUMEN

BACKGROUND: Graphlets are useful for bioinformatics network analysis. Based on the structure of Hocevar and Demsar's ORCA algorithm, we have created an orbit counting algorithm, named Jesse. This algorithm, like ORCA, uses equations to count the orbits, but unlike ORCA it can count graphlets of any order. To do so, it generates the required internal structures and equations automatically. Many more redundant equations are generated, however, and Jesse's running time is highly dependent on which of these equations are used. Therefore, this paper aims to investigate which equations are most efficient, and which factors have an effect on this efficiency. RESULTS: With appropriate equation selection, Jesse's running time may be reduced by a factor of up to 2 in the best case, compared to using randomly selected equations. Which equations are most efficient depends on the density of the graph, but barely on the graph type. At low graph density, equations with terms in their right-hand side with few arguments are more efficient, whereas at high density, equations with terms with many arguments in the right-hand side are most efficient. At a density between 0.6 and 0.7, both types of equations are about equally efficient. CONCLUSIONS: Our Jesse algorithm became up to a factor 2 more efficient, by automatically selecting the best equations based on graph density. It was adapted into a Cytoscape App that is freely available from the Cytoscape App Store to ease application by bioinformaticians.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Gráficos por Computador , Interpretación Estadística de Datos , Diabetes Mellitus/metabolismo , Mapeo de Interacción de Proteínas/métodos , Proteínas/metabolismo , Simulación por Computador , Humanos , Modelos Biológicos
4.
Bioinformatics ; 34(8): 1372-1380, 2018 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-29186327

RESUMEN

Motivation: Graphlets are a useful tool to determine a graph's small-scale structure. Finding them is exponentially hard with respect to the number of nodes in each graphlet. Therefore, equations can be used to reduce the size of graphlets that need to be enumerated to calculate the number of each graphlet touching each node. Hocevar and Demsar first introduced such equations, which were derived manually, and an algorithm that uses them, but only graphlets with four or five nodes can be counted this way. Results: We present a new algorithm for orbit counting, which is applicable to graphlets of any order. This algorithm uses a tree structure to simplify finding orbits, and stabilizers and symmetry-breaking constraints to ensure correctness. This method gives a significant speedup compared to a brute force counting method and can count orbits beyond the capacity of other available tools. Availability and implementation: An implementation of the algorithm can be found at https://github.com/biointec/jesse. Contact: pieter.audenaert@ugent.be.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Gráficos por Computador
5.
Bioinformatics ; 33(17): 2740-2742, 2017 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-28472230

RESUMEN

MOTIVATION: The Bionano Genomics platform allows for the optical detection of short sequence patterns in very long DNA molecules (up to 2.5 Mbp). Molecules with overlapping patterns can be assembled to generate a consensus optical map of the entire genome. In turn, these optical maps can be used to validate or improve de novo genome assembly projects or to detect large-scale structural variation in genomes. Simulated optical map data can assist in the development and benchmarking of tools that operate on those data, such as alignment and assembly software. Additionally, it can help to optimize the experimental setup for a genome of interest. Such a simulator is currently not available. RESULTS: We have developed a simulator, OMSim, that produces synthetic optical map data that mimics real Bionano Genomics data. These simulated data have been tested for compatibility with the Bionano Genomics Irys software system and the Irys-scaffolding scripts. OMSim is capable of handling very large genomes (over 30 Gbp) with high throughput and low memory requirements. AVAILABILITY AND IMPLEMENTATION: The Python simulation tool and a cross-platform graphical user interface are available as open source software under the GNU GPL v2 license ( http://www.bioinformatics.intec.ugent.be/omsim ). CONTACT: jan.fostier@ugent.be. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma Humano , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Genómica/métodos , Humanos
6.
Bioinformatics ; 33(3): 461-463, 2017 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-28158465

RESUMEN

Summary: We present a Cytoscape app for the ISMAGS algorithm, which can enumerate all instances of a motif in a graph, making optimal use of the motif's symmetries to make the search more efficient. The Cytoscape app provides a handy interface for this algorithm, which allows more efficient network analysis. Availability and Implementation: The Cytoscape app for ISMAGS can be freely downloaded from the Cytoscape App store http://apps.cytoscape.org/apps/ismags. Source code and documentation for ISMAGS are available at https://github.com/biointec/ismags. Source code and documentation for the Cytoscape app are available at https://gitlab.psb.ugent.be/thpar/ISMAGS_Cytoscape. Contacts: Pieter.Audenaert@intec.ugent.be or Yves.VanDePeer@psb.vib-ugent.be Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Programas Informáticos , Algoritmos , Gráficos por Computador
7.
Bioinformatics ; 31(23): 3758-66, 2015 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-26254488

RESUMEN

MOTIVATION: The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. RESULTS: We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. AVAILABILITY AND IMPLEMENTATION: BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller CONTACT: Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Genoma de Planta , Regiones Promotoras Genéticas , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Sitios de Unión , Secuencia Conservada , ADN de Plantas/química , Motivos de Nucleótidos , Alineación de Secuencia , Programas Informáticos , Factores de Transcripción/metabolismo
8.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 1995-2006, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37015543

RESUMEN

In de novo genome assembly using short Illumina reads, the accurate determination of node and arc multiplicities in a de Bruijn graph has a large impact on the quality and contiguity of the assembly. The multiplicity estimates of nodes and arcs guide the cleaning of the de Bruijn graph by identifying spurious nodes and arcs that correspond to sequencing errors. Additionally, they can be used to guide repeat resolution. Here, we model the entire de Bruijn graph and the accompanying read coverage information with a single Conditional Random Field (CRF) model. We show that approximate inference using Loopy Belief Propagation (LBP) on our model improves multiplicity assignment accuracy within feasible runtimes. The order in which messages are passed has a large influence on the speed of LBP convergence. Little theoretical guarantees exist and the conditions for convergence are not easily checked as our CRF model contains higher-order interactions. Therefore, we also present an empirical evaluation of several message passing schemes that may guide future users of LBP on CRFs with higher-order interactions in their choice of message passing scheme.


Asunto(s)
Algoritmos , Fatiga , Humanos , Análisis de Secuencia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos
9.
Bioinformatics ; 27(11): 1587-8, 2011 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-21478195

RESUMEN

SUMMARY: Network motifs in integrated molecular networks represent functional relationships between distinct data types. They aggregate to form dense topological structures corresponding to functional modules which cannot be detected by traditional graph clustering algorithms. We developed CyClus3D, a Cytoscape plugin for clustering composite three-node network motifs using a 3D spectral clustering algorithm. AVAILABILITY: Via the Cytoscape plugin manager or http://bioinformatics.psb.ugent.be/software/details/CyClus3D.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Análisis por Conglomerados , Modelos Biológicos , Mapeo de Interacción de Proteínas , Transducción de Señal , Programas Informáticos
10.
PLoS One ; 17(6): e0270147, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35709229

RESUMEN

Many real-life problems boil down to a variant of the Minimum Steiner Tree Problem (STP). In telecommunications, Fiber-To-The-Home (FTTH) houses are clustered so they can be connected with fiber as cost-efficiently as possible. The cost calculation of a fiber installment can be formulated as a capacitated STP. Often, STP variants are solved with integer linear programs, which provide excellent solutions, though the running time costs increase quickly with graph size. Some geographical areas require graphs of over 20000 nodes-typically unattainable for integer linear programs. This paper presents an alternative approach. It extends the shortest path heuristic for the STP to a new heuristic that can construct solutions for the capacitated STP: the Capacitated Shortest Path Heuristic (CSPH). It is straightforward to implement, allowing many extensions. In experiments on realistic telecommunications datasets, CSPH finds solutions on average in time O(|V|2), quadratic in the number of nodes, making it possible to solve 50000 node graphs in under a minute.


Asunto(s)
Telecomunicaciones , Algoritmos
11.
Front Genet ; 10: 1196, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31921278

RESUMEN

Studying the impact of genetic variation on gene regulatory networks is essential to understand the biological mechanisms by which genetic variation causes variation in phenotypes. Bayesian networks provide an elegant statistical approach for multi-trait genetic mapping and modelling causal trait relationships. However, inferring Bayesian gene networks from high-dimensional genetics and genomics data is challenging, because the number of possible networks scales super-exponentially with the number of nodes, and the computational cost of conventional Bayesian network inference methods quickly becomes prohibitive. We propose an alternative method to infer high-quality Bayesian gene networks that easily scales to thousands of genes. Our method first reconstructs a node ordering by conducting pairwise causal inference tests between genes, which then allows to infer a Bayesian network via a series of independent variable selection problems, one for each gene. We demonstrate using simulated and real systems genetics data that this results in a Bayesian network with equal, and sometimes better, likelihood than the conventional methods, while having a significantly higher overlap with groundtruth networks and being orders of magnitude faster. Moreover our method allows for a unified false discovery rate control across genes and individual edges, and thus a rigorous and easily interpretable way for tuning the sparsity level of the inferred network. Bayesian network inference using pairwise node ordering is a highly efficient approach for reconstructing gene regulatory networks when prior information for the inclusion of edges exists or can be inferred from the available data.

12.
PLoS One ; 11(1): e0147078, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26797021

RESUMEN

Graphlets are small subgraphs, usually containing up to five vertices, that can be found in a larger graph. Identification of the graphlets that a vertex in an explored graph touches can provide useful information about the local structure of the graph around that vertex. Actually finding all graphlets in a large graph can be time-consuming, however. As the graphlets grow in size, more different graphlets emerge and the time needed to find each graphlet also scales up. If it is not needed to find each instance of each graphlet, but knowing the number of graphlets touching each node of the graph suffices, the problem is less hard. Previous research shows a way to simplify counting the graphlets: instead of looking for the graphlets needed, smaller graphlets are searched, as well as the number of common neighbors of vertices. Solving a system of equations then gives the number of times a vertex is part of each graphlet of the desired size. However, until now, equations only exist to count graphlets with 4 or 5 nodes. In this paper, two new techniques are presented. The first allows to generate the equations needed in an automatic way. This eliminates the tedious work needed to do so manually each time an extra node is added to the graphlets. The technique is independent on the number of nodes in the graphlets and can thus be used to count larger graphlets than previously possible. The second technique gives all graphlets a unique ordering which is easily extended to name graphlets of any size. Both techniques were used to generate equations to count graphlets with 4, 5 and 6 vertices, which extends all previous results. Code can be found at https://github.com/IneMelckenbeeck/equation-generator and https://github.com/IneMelckenbeeck/graphlet-naming.


Asunto(s)
Algoritmos , Gráficos por Computador , Simulación por Computador , Interpretación Estadística de Datos , Humanos , Modelos Biológicos
13.
Algorithms Mol Biol ; 11: 10, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27148393

RESUMEN

BACKGROUND: Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. RESULTS: In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented. CONCLUSION: Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.

14.
PLoS One ; 9(5): e97896, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24879305

RESUMEN

Subgraph matching algorithms are used to find and enumerate specific interconnection structures in networks. By enumerating these specific structures/subgraphs, the fundamental properties of the network can be derived. More specifically in biological networks, subgraph matching algorithms are used to discover network motifs, specific patterns occurring more often than expected by chance. Finding these network motifs yields information on the underlying biological relations modelled by the network. In this work, we present the Index-based Subgraph Matching Algorithm with General Symmetries (ISMAGS), an improved version of the Index-based Subgraph Matching Algorithm (ISMA). ISMA quickly finds all instances of a predefined motif in a network by intelligently exploring the search space and taking into account easily identifiable symmetric structures. However, more complex symmetries (possibly involving switching multiple nodes) are not taken into account, resulting in superfluous output. ISMAGS overcomes this problem by using a customised symmetry analysis phase to detect all symmetric structures in the network motif subgraphs. These structures are then converted to symmetry-breaking constraints used to prune the search space and speed up calculations. The performance of the algorithm was tested on several types of networks (biological, social and computer networks) for various subgraphs with a varying degree of symmetry. For subgraphs with complex (multi-node) symmetric structures, high speed-up factors are obtained as the search space is pruned by the symmetry-breaking constraints. For subgraphs with no or simple symmetric structures, ISMAGS still reduces computation times by optimising set operations. Moreover, the calculated list of subgraph instances is minimal as it contains no instances that differ by only a subgraph symmetry. An implementation of the algorithm is freely available at https://github.com/mhoubraken/ISMAGS.


Asunto(s)
Algoritmos , Gráficos por Computador , Modelos Teóricos , Factores de Tiempo
15.
PLoS One ; 8(4): e61183, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23620730

RESUMEN

Subgraph matching algorithms are designed to find all instances of predefined subgraphs in a large graph or network and play an important role in the discovery and analysis of so-called network motifs, subgraph patterns which occur more often than expected by chance. We present the index-based subgraph matching algorithm (ISMA), a novel tree-based algorithm. ISMA realizes a speedup compared to existing algorithms by carefully selecting the order in which the nodes of a query subgraph are investigated. In order to achieve this, we developed a number of data structures and maximally exploited symmetry characteristics of the subgraph. We compared ISMA to a naive recursive tree-based algorithm and to a number of well-known subgraph matching algorithms. Our algorithm outperforms the other algorithms, especially on large networks and with large query subgraphs. An implementation of ISMA in Java is freely available at http://sourceforge.net/projects/isma/.


Asunto(s)
Algoritmos , Transducción de Señal , Programas Informáticos , Factores de Tiempo
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda