Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38836702

RESUMO

Non-invasive prenatal testing (NIPT) is a quite popular approach for detecting fetal genomic aneuploidies. However, due to the limitations on sequencing read length and coverage, NIPT suffers a bottleneck on further improving performance and conducting earlier detection. The errors mainly come from reference biases and population polymorphism. To break this bottleneck, we proposed NIPT-PG, which enables the NIPT algorithm to learn from population data. A pan-genome model is introduced to incorporate variant and polymorphic loci information from tested population. Subsequently, we proposed a sequence-to-graph alignment method, which considers the read mis-match rates during the mapping process, and an indexing method using hash indexing and adjacency lists to accelerate the read alignment process. Finally, by integrating multi-source aligned read and polymorphic sites across the pan-genome, NIPT-PG obtains a more accurate z-score, thereby improving the accuracy of chromosomal aneuploidy detection. We tested NIPT-PG on two simulated datasets and 745 real-world cell-free DNA sequencing data sets from pregnant women. Results demonstrate that NIPT-PG outperforms the standard z-score test. Furthermore, combining experimental and theoretical analyses, we demonstrate the probably approximately correct learnability of NIPT-PG. In summary, NIPT-PG provides a new perspective for fetal chromosomal aneuploidies detection. NIPT-PG may have broad applications in clinical testing, and its detection results can serve as a reference for false positive samples approaching the critical threshold.


Assuntos
Aneuploidia , Teste Pré-Natal não Invasivo , Humanos , Feminino , Gravidez , Teste Pré-Natal não Invasivo/métodos , Algoritmos , Genômica/métodos , Diagnóstico Pré-Natal/métodos , Análise de Sequência de DNA/métodos
2.
Algorithms Mol Biol ; 19(1): 10, 2024 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-38468275

RESUMO

BACKGROUND: We study the problem of finding maximal exact matches (MEMs) between a query string Q and a labeled graph G. MEMs are an important class of seeds, often used in seed-chain-extend type of practical alignment methods because of their strong connections to classical metrics. A principled way to speed up chaining is to limit the number of MEMs by considering only MEMs of length at least κ ( κ -MEMs). However, on arbitrary input graphs, the problem of finding MEMs cannot be solved in truly sub-quadratic time under SETH (Equi et al., TALG 2023) even on acyclic graphs. RESULTS: In this paper we show an O ( n · L · d L - 1 + m + M κ , L ) -time algorithm finding all κ -MEMs between Q and G spanning exactly L nodes in G, where n is the total length of node labels, d is the maximum degree of a node in G, m = | Q | , and M κ , L is the number of output MEMs. We use this algorithm to develop a κ -MEM finding solution on indexable Elastic Founder Graphs (Equi et al., Algorithmica 2022) running in time O ( n H 2 + m + M κ ) , where H is the maximum number of nodes in a block, and M κ is the total number of κ -MEMs. Our results generalize to the analysis of multiple query strings (MEMs between G and any of the strings). Additionally, we provide some experimental results showing that the number of graph MEMs is an order of magnitude smaller than the number of string MEMs of the corresponding concatenated collection. CONCLUSIONS: We show that seed-chain-extend type of alignment methods can be implemented on top of indexable Elastic Founder Graphs by providing an efficient way to produce the seeds between a set of queries and the graph. The code is available in https://github.com/algbio/efg-mems .

3.
BMC Bioinformatics ; 24(1): 400, 2023 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-37884897

RESUMO

BACKGROUND: Pan-genome graphs are gaining importance in the field of bioinformatics as data structures to represent and jointly analyze multiple genomes. Compacted de Bruijn graphs are inherently suited for this purpose, as their graph topology naturally reveals similarity and divergence within the pan-genome. Most state-of-the-art pan-genome graphs are represented explicitly in terms of nodes and edges. Recently, an alternative, implicit graph representation was proposed that builds directly upon the unidirectional FM-index. As such, a memory-efficient graph data structure is obtained that inherits the FM-index' backward search functionality. However, this representation suffers from a number of shortcomings in terms of functionality and algorithmic performance. RESULTS: We present a data structure for a pan-genome, compacted de Bruijn graph that aims to address these shortcomings. It is built on the bidirectional FM-index, extending the ability of its unidirectional counterpart to navigate and search the graph in both directions. All basic graph navigation steps can be performed in constant time. Based on these features, we implement subgraph visualization as well as lossless approximate pattern matching to the graph using search schemes. We demonstrate that we can retrieve all occurrences corresponding to a read within a certain edit distance in a very efficient manner. Through a case study, we show the potential of exploiting the information embedded in the graph's topology through visualization and sequence alignment. CONCLUSIONS: We propose a memory-efficient representation of the pan-genome graph that supports subgraph visualization and lossless approximate pattern matching of reads against the graph using search schemes. The C++ source code of our software, called Nexus, is available at https://github.com/biointec/nexus under AGPL-3.0 license.


Assuntos
Algoritmos , Genoma , Análise de Sequência de DNA , Software , Biologia Computacional
4.
Biomimetics (Basel) ; 7(1)2022 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-35225919

RESUMO

Metabolic pathways provide key information for achieving a better understanding of life and all its processes; this is useful information for the improvement of medicine, agronomy, pharmacy, and other similar areas. The main analysis tool used to study these pathways is based on pathway comparison, using graph data structures. Metabolic pathway comparison has been defined as a computationally complex task. In a previous work, two new algorithms were introduced to treat the problem of metabolic pathway pairwise comparison. Here we provide an extended analysis with more data and a deeper analysis of metabolic pathway comparison as listed in the discussion and results section.

5.
Netw Neurosci ; 5(3): 711-733, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34746624

RESUMO

The interactions between different brain regions can be modeled as a graph, called connectome, whose nodes correspond to parcels from a predefined brain atlas. The edges of the graph encode the strength of the axonal connectivity between regions of the atlas that can be estimated via diffusion magnetic resonance imaging (MRI) tractography. Herein, we aim to provide a novel perspective on the problem of choosing a suitable atlas for structural connectivity studies by assessing how robustly an atlas captures the network topology across different subjects in a homogeneous cohort. We measure this robustness by assessing the alignability of the connectomes, namely the possibility to retrieve graph matchings that provide highly similar graphs. We introduce two novel concepts. First, the graph Jaccard index (GJI), a graph similarity measure based on the well-established Jaccard index between sets; the GJI exhibits natural mathematical properties that are not satisfied by previous approaches. Second, we devise WL-align, a new technique for aligning connectomes obtained by adapting the Weisfeiler-Leman (WL) graph-isomorphism test. We validated the GJI and WL-align on data from the Human Connectome Project database, inferring a strategy for choosing a suitable parcellation for structural connectivity studies. Code and data are publicly available.

6.
J Neurosci Methods ; 348: 109014, 2021 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-33309587

RESUMO

BACKGROUND: Presence of multimodal brain graphs derived from different neuroimaging modalities is inarguably one of the most critical challenges in building unified classification models that can be trained and tested on any brain graph regardless of its size and the modality it was derived from. EXISTING METHODS: One solution is to learn a model for each modality independently, which is cumbersome and becomes more time-consuming as the number of modalities increases. Another traditional solution is to build a model inputting multimodal brain graphs for the target prediction task; however, this is only applicable to datasets where all samples have joint neuro-modalities. NEW METHOD: In this paper, we propose to build a unified brain graph classification model trained on unpaired multimodal brain graphs, which can classify any brain graph of any size. This is enabled by incorporating a graph alignment step where all multi-modal graphs of different sizes and heterogeneous distributions are mapped to a common template graph. Next, we design a graph alignment strategy to the target fixed-size template and further apply linear discriminant analysis (LDA) to the aligned graphs as a supervised dimensionality reduction technique for the target classification task. RESULTS: We tested our method on unpaired autistic and healthy brain connectomes derived from functional and morphological MRI datasets (two modalities). CONCLUSION: Our results showed that our unified model method not only has great promise in solving such a challenging problem but achieves comparable performance to models trained on each modality independently.


Assuntos
Transtorno Autístico , Conectoma , Encéfalo/diagnóstico por imagem , Humanos , Imageamento por Ressonância Magnética , Neuroimagem
7.
BMC Bioinformatics ; 21(Suppl 12): 306, 2020 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-32703258

RESUMO

BACKGROUND: Graph-based representation of genome assemblies has been recently used in different contexts - from improved reconstruction of plasmid sequences and refined analysis of metagenomic data to read error correction and reference-free haplotype reconstruction. While many of these applications heavily utilize the alignment of long nucleotide sequences to assembly graphs, first general-purpose software tools for finding such alignments have been released only recently and their deficiencies and limitations are yet to be discovered. Moreover, existing tools can not perform alignment of amino acid sequences, which could prove useful in various contexts - in particular the analysis of metagenomic sequencing data. RESULTS: In this work we present a novel SPAligner (Saint-Petersburg Aligner) tool for aligning long diverged nucleotide and amino acid sequences to assembly graphs. We demonstrate that SPAligner is an efficient solution for mapping third generation sequencing reads onto assembly graphs of various complexity and also show how it can facilitate the identification of known genes in complex metagenomic datasets. CONCLUSIONS: Our work will facilitate accelerating the development of graph-based approaches in solving sequence to genome assembly alignment problem. SPAligner is implemented as a part of SPAdes tools library and is available on Github.


Assuntos
Algoritmos , Variação Genética , Alinhamento de Sequência , Sequência de Bases , Haplótipos/genética , Humanos , Software , Estatística como Assunto , beta-Lactamases/química
8.
BMC Bioinformatics ; 19(1): 444, 2018 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-30458725

RESUMO

BACKGROUND: While the reconstruction of transcripts from a sample of RNA-Seq data is a computationally expensive and complicated task, the detection of splicing events from RNA-Seq data and a gene annotation is computationally feasible. This latter task, which is adequate for many transcriptome analyses, is usually achieved by aligning the reads to a reference genome, followed by comparing the alignments with a gene annotation, often implicitly represented by a graph: the splicing graph. RESULTS: We present ASGAL (Alternative Splicing Graph ALigner): a tool for mapping RNA-Seq data to the splicing graph, with the specific goal of detecting novel splicing events, involving either annotated or unannotated splice sites. ASGAL takes as input the annotated transcripts of a gene and a RNA-Seq sample, and computes (1) the spliced alignments of each read in input, and (2) a list of novel events with respect to the gene annotation. CONCLUSIONS: An experimental analysis shows that ASGAL allows to enrich the annotation with novel alternative splicing events even when genes in an experiment express at most one isoform. Compared with other tools which use the spliced alignment of reads against a reference genome for differential analysis, ASGAL better predicts events that use splice sites which are novel with respect to a splicing graph, showing a higher accuracy. To the best of our knowledge, ASGAL is the first tool that detects novel alternative splicing events by directly aligning reads to a splicing graph. AVAILABILITY: Source code, documentation, and data are available for download at http://asgal.algolab.eu .


Assuntos
Processamento Alternativo/genética , Splicing de RNA/genética , RNA/genética , Análise de Sequência de RNA/métodos , Humanos
9.
BMC Bioinformatics ; 19(1): 311, 2018 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-30180801

RESUMO

BACKGROUND: Aligning short reads to a reference genome is an important task in many genome analysis pipelines. This task is computationally more complex when the reference genome is provided in the form of a de Bruijn graph instead of a linear sequence string. RESULTS: We present a branch and bound alignment algorithm that uses the seed-and-extend paradigm to accurately align short Illumina reads to a graph. Given a seed, the algorithm greedily explores all branches of the tree until the optimal alignment path is found. To reduce the search space we compute upper bounds to the alignment score for each branch and discard the branch if it cannot improve the best solution found so far. Additionally, by using a two-pass alignment strategy and a higher-order Markov model, paths in the de Bruijn graph that do not represent a subsequence in the original reference genome are discarded from the search procedure. CONCLUSIONS: BrownieAligner is applied to both synthetic and real datasets. It generally outperforms other state-of-the-art tools in terms of accuracy, while having similar runtime and memory requirements. Our results show that using the higher-order Markov model in BrownieAligner improves the accuracy, while the branch and bound algorithm reduces runtime. BrownieAligner is written in standard C++11 and released under GPL license. BrownieAligner relies on multithreading to take advantage of multi-core/multi-CPU systems. The source code is available at: https://github.com/biointec/browniealigner.


Assuntos
Algoritmos , Biologia Computacional/métodos , Gráficos por Computador , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Humanos , Linguagens de Programação
10.
J Theor Biol ; 365: 226-37, 2015 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-25308508

RESUMO

Bifurcating developmental branching morphogenesis gives rise to complex organs such as the lung and the ureteric tree of the kidney. However, a few quantitative methods or tools exist to compare and distinguish, at a structural level, the critical features of these important biological systems. Here we develop novel graph alignment techniques to quantify the structural differences of rooted bifurcating trees and demonstrate their application in the analysis of developing kidneys from in normal and mutant mice. We have developed two graph based metrics: graph discordance, which measures how well the graphs representing the branching structures of distinct trees graphs can be aligned or overlayed; and graph inclusion, which measures the degree of containment of a tree graph within another. To demonstrate the application of these approaches we first benchmark the discordance metric on a data set of 32 normal and 28Tgfß(+/-) mutant mouse ureteric trees. We find that the discordance metric better distinguishes control and mutant mouse kidneys than alternative metrics based on graph size and fingerprints - the distribution of tip depths. Using this metric we then show that the structure of the mutant trees follows the same pattern as the normal kidneys, but undergo a major delay in elaboration at later stages. Analysis of both controls and mutants using the inclusion metric gives strong support to the hypothesis that ureteric tree growth is stereotypic. Additionally, we present a new generalised multi-tree alignment algorithm that minimises the sum of pairwise graph discordance and which can be used to generate maximum consensus trees that represent the archetype for fixed developmental stages. These tools represent an advance in the analysis and quantification of branching patterns and will be invaluable in gaining a deeper understanding of the mechanisms that drive development. All code is being made available with documentation and example data with this publication.


Assuntos
Morfogênese , Ureter/crescimento & desenvolvimento , Animais , Rim/crescimento & desenvolvimento , Rim/metabolismo , Camundongos , Mutação/genética , Fator de Crescimento Transformador beta2/metabolismo , Ureter/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA