Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
Genome Res ; 33(7): 1208-1217, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37072187

RESUMEN

Sequence-to-graph alignment is crucial for applications such as variant genotyping, read error correction, and genome assembly. We propose a novel seeding approach that relies on long inexact matches rather than short exact matches, and show that it yields a better time-accuracy trade-off in settings with up to a [Formula: see text] mutation rate. We use sketches of a subset of graph nodes, which are more robust to indels, and store them in a k-nearest neighbor index to avoid the curse of dimensionality. Our approach contrasts with existing methods and highlights the important role that sketching into vector space can play in bioinformatics applications. We show that our method scales to graphs with 1 billion nodes and has quasi-logarithmic query time for queries with an edit distance of [Formula: see text] For such queries, longer sketch-based seeds yield a [Formula: see text] increase in recall compared with exact seeds. Our approach can be incorporated into other aligners, providing a novel direction for sequence-to-graph alignment.


Asunto(s)
Algoritmos , Biología Computacional , Biología Computacional/métodos , Alineación de Secuencia , Análisis de Secuencia de ADN/métodos
2.
J Comput Biol ; 27(4): 626-639, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-31891531

RESUMEN

High-throughput DNA sequencing data are accumulating in public repositories, and efficient approaches for storing and indexing such data are in high demand. In recent research, several graph data structures have been proposed to represent large sets of sequencing data and to allow for efficient querying of sequences. In particular, the concept of labeled de Bruijn graphs has been explored by several groups. Although there has been good progress toward representing the sequence graph in small space, methods for storing a set of labels on top of such graphs are still not sufficiently explored. It is also currently not clear how characteristics of the input data, such as the sparsity and correlations of labels, can help to inform the choice of method to compress the graph labeling. In this study, we present a new compression approach, Multi-binary relation wavelet tree (BRWT), which is adaptive to different kinds of input data. We show an up to 29% improvement in compression performance over the basic BRWT method, and up to a 68% improvement over the current state-of-the-art for de Bruijn graph label compression. To put our results into perspective, we present a systematic analysis of five different state-of-the-art annotation compression schemes, evaluate key metrics on both artificial and real-world data, and discuss how different data characteristics influence the compression performance. We show that the improvements of our new method can be robustly reproduced for different representative real-world data sets.


Asunto(s)
Genoma/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Biología Computacional , Compresión de Datos , Humanos , Anotación de Secuencia Molecular/métodos
3.
IEEE Trans Pattern Anal Mach Intell ; 38(7): 1452-64, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-26452249

RESUMEN

Nonlinear dimensionality reduction methods have demonstrated top-notch performance in many pattern recognition and image classification tasks. Despite their popularity, they suffer from highly expensive time and memory requirements, which render them inapplicable to large-scale datasets. To leverage such cases we propose a new method called "Path-Based Isomap". Similar to Isomap, we exploit geodesic paths to find the low-dimensional embedding. However, instead of preserving pairwise geodesic distances, the low-dimensional embedding is computed via a path-mapping algorithm. Due to the much fewer number of paths compared to number of data points, a significant improvement in time and memory complexity with a comparable performance is achieved. The method demonstrates state-of-the-art performance on well-known synthetic and real-world datasets, as well as in the presence of noise.

4.
PLoS One ; 7(4): e35673, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22558196

RESUMEN

Functional connectivity in human brain can be represented as a network using electroencephalography (EEG) signals. These networks--whose nodes can vary from tens to hundreds--are characterized by neurobiologically meaningful graph theory metrics. This study investigates the degree to which various graph metrics depend upon the network size. To this end, EEGs from 32 normal subjects were recorded and functional networks of three different sizes were extracted. A state-space based method was used to calculate cross-correlation matrices between different brain regions. These correlation matrices were used to construct binary adjacency connectomes, which were assessed with regards to a number of graph metrics such as clustering coefficient, modularity, efficiency, economic efficiency, and assortativity. We showed that the estimates of these metrics significantly differ depending on the network size. Larger networks had higher efficiency, higher assortativity and lower modularity compared to those with smaller size and the same density. These findings indicate that the network size should be considered in any comparison of networks across studies.


Asunto(s)
Encéfalo/fisiología , Red Nerviosa/fisiología , Adulto , Anciano , Mapeo Encefálico/métodos , Electroencefalografía , Femenino , Humanos , Masculino , Persona de Mediana Edad , Modelos Neurológicos , Tamaño de la Muestra , Procesamiento de Señales Asistido por Computador
5.
Front Hum Neurosci ; 6: 335, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23267325

RESUMEN

Abnormalities in the topology of brain networks may be an important feature and etiological factor for psychogenic non-epileptic seizures (PNES). To explore this possibility, we applied a graph theoretical approach to functional networks based on resting state EEGs from 13 PNES patients and 13 age- and gender-matched controls. The networks were extracted from Laplacian-transformed time-series by a cross-correlation method. PNES patients showed close to normal local and global connectivity and small-world structure, estimated with clustering coefficient, modularity, global efficiency, and small-worldness (SW) metrics, respectively. Yet the number of PNES attacks per month correlated with a weakness of local connectedness and a skewed balance between local and global connectedness quantified with SW, all in EEG alpha band. In beta band, patients demonstrated above-normal resiliency, measured with assortativity coefficient, which also correlated with the frequency of PNES attacks. This interictal EEG phenotype may help improve differentiation between PNES and epilepsy. The results also suggest that local connectivity could be a target for therapeutic interventions in PNES. Selective modulation (strengthening) of local connectivity might improve the skewed balance between local and global connectivity and so prevent PNES events.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA