Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Nat Methods ; 19(10): 1230-1233, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36109679

RESUMEN

Complex structural variants (CSVs) encompass multiple breakpoints and are often missed or misinterpreted. We developed SVision, a deep-learning-based multi-object-recognition framework, to automatically detect and haracterize CSVs from long-read sequencing data. SVision outperforms current callers at identifying the internal structure of complex events and has revealed 80 high-quality CSVs with 25 distinct structures from an individual genome. SVision directly detects CSVs without matching known structures, allowing sensitive detection of both common and previously uncharacterized complex rearrangements.


Asunto(s)
Aprendizaje Profundo , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN
2.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37200087

RESUMEN

Structural variant (SV) detection is essential for genomic studies, and long-read sequencing technologies have advanced our capacity to detect SVs directly from read or de novo assembly, also known as read-based and assembly-based strategy. However, to date, no independent studies have compared and benchmarked the two strategies. Here, on the basis of SVs detected by 20 read-based and eight assembly-based detection pipelines from six datasets of HG002 genome, we investigated the factors that influence the two strategies and assessed their performance with well-curated SVs. We found that up to 80% of the SVs could be detected by both strategies among different long-read datasets, whereas variant type, size, and breakpoint detected by read-based strategy were greatly affected by aligners. For the high-confident insertions and deletions at non-tandem repeat regions, a remarkable subset of them (82% in assembly-based calls and 93% in read-based calls), accounting for around 4000 SVs, could be captured by both reads and assemblies. However, discordance between two strategies was largely caused by complex SVs and inversions, which resulted from inconsistent alignment of reads and assemblies at these loci. Finally, benchmarking with SVs at medically relevant genes, the recall of read-based strategy reached 77% on 5X coverage data, whereas assembly-based strategy required 20X coverage data to achieve similar performance. Therefore, integrating SVs from read and assembly is suggested for general-purpose detection because of inconsistently detected complex SVs and inversions, whereas assembly-based strategy is optional for applications with limited resources.


Asunto(s)
Benchmarking , Genoma Humano , Humanos , Análisis de Secuencia , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
3.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36594541

RESUMEN

MOTIVATION: Beyond identifying genetic variants, we introduce a set of Boolean relations, which allows for a comprehensive classification of the relations of every pair of variants by taking all minimal alignments into account. We present an efficient algorithm to compute these relations, including a novel way of efficiently computing all minimal alignments within the best theoretical complexity bounds. RESULTS: We show that these relations are common, and many non-trivial, for variants of the CFTR gene in dbSNP. Ultimately, we present an approach for the storing and indexing of variants in the context of a database that enables efficient querying for all these relations. AVAILABILITY AND IMPLEMENTATION: A Python implementation is available at https://github.com/mutalyzer/algebra/tree/v0.2.0 as well as an interface at https://mutalyzer.nl/algebra.


Asunto(s)
Algoritmos , Manejo de Datos , Bases de Datos Factuales , Programas Informáticos
4.
Genomics Proteomics Bioinformatics ; 20(1): 205-218, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34224879

RESUMEN

Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako.


Asunto(s)
Algoritmos , Genómica , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Mutación , Análisis de Secuencia de ADN
5.
Bioinformatics ; 23(6): 687-93, 2007 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-17237070

RESUMEN

MOTIVATION: Pattern discovery in protein sequences is often based on multiple sequence alignments (MSA). The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In contrast, two algorithms, PRATT2 (http//www.ebi.ac.uk/pratt/) and TEIRESIAS (http://cbcsrv.watson.ibm.com/) are used to directly identify frequent patterns from unaligned biological sequences without an attempt to align them. Here we propose a new algorithm with more efficiency and more functionality than both PRATT2 and TEIRESIAS, and discuss some of its applications to G protein-coupled receptors, a protein family of important drug targets. RESULTS: In this study, we designed and implemented six algorithms to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets. AVAILABILITY: The source code for pattern growth algorithms and their pseudo-code are available at http://www.liacs.nl/home/kosters/pg/.


Asunto(s)
Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Sistemas de Administración de Bases de Datos , Datos de Secuencia Molecular
6.
Appl Netw Sci ; 3(1): 39, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30839798

RESUMEN

In corporate networks, firms are connected through links of corporate ownership and shared directors, connecting the control over major economic actors in our economies in meaningful and consequential ways. Most research thus far focused on the connectedness of firms as a result of one particular link type, analyzing node-specific metrics or global network-based methods to gain insights in the modelled corporate system. In this paper, we aim to understand multiplex corporate networks with multiple types of connections, specifically investigating the network's essential building blocks: multiplex network motifs. Motifs, which are small subgraph patterns occurring at significantly higher frequencies than in similar random networks, have demonstrated their usefulness in understanding the structure of many types of real-world networks. However, detecting motifs in multiplex networks is nontrivial for two reasons. First of all, there are no out-of-the-box subgraph enumeration algorithms for multiplex networks. Second, existing null models to test network motif significance, are unable to incorporate the interlayer dependencies in the multiplex network. We solve these two issues by introducing a layer encoding algorithm that incorporates the multiplex aspect in the subgraph enumeration phase. In addition, we propose a null model that is able to preserve the interlayer connectedness, while taking into account that one of the link types is actually the result of a projection of an underlying bipartite network. The experimental section considers the corporate network of Germany, in which tens of thousands of firms are connected through several hundred thousand links. We demonstrate how incorporating the multiplex aspect in motif detection is able to reveal new insights that could not be obtained by studying only one type of relationship. In a general sense, the motifs reflect known corporate governance practices related to the monitoring of investments and the concentration of ownership. A substantial fraction of the discovered motifs is typical for an industrialized country such as Germany, whereas others seem specific for certain economic sectors. Interestingly, we find that motifs involving financial firms are over-represented amongst the larger and more complex motifs. This demonstrates the prominent role of the financial sector in Germany's largely industry-oriented corporate network.

7.
Chemistry ; 10(13): 3252-60, 2004 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-15224334

RESUMEN

A lanthanide complex, named CLaNP (caged lanthanide NMR probe) has been developed for the characterisation of proteins by paramagnetic NMR spectroscopy. The probe consists of a lanthanide chelated by a derivative of DTPA (diethylenetriaminepentaacetic acid) with two thiol reactive functional groups. The CLaNP molecule is attached to a protein by two engineered, surface-exposed, Cys residues in a bidentate manner. This drastically limits the dynamics of the metal relative to the protein and enables measurements of pseudocontact shifts. NMR spectroscopy experiments on a diamagnetic control and the crystal structure of the probe-protein complex demonstrate that the protein structure is not affected by probe attachment. The probe is able to induce pseudocontact shifts to at least 40 A from the metal and causes residual dipolar couplings due to alignment at a high magnetic field. The molecule exists in several isomeric forms with different paramagnetic tensors; this provides a fast way to obtain long-range distance restraints.


Asunto(s)
Azurina/análogos & derivados , Elementos de la Serie de los Lantanoides/química , Resonancia Magnética Nuclear Biomolecular/métodos , Ácido Pentético/química , Azurina/química , Cristalografía por Rayos X , Elementos de la Serie de los Lantanoides/síntesis química , Espectrometría de Masas , Modelos Moleculares
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA