Pesquisa | BVS Doenças Infecciosas e Parasitárias

Correction: Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen, Roy; Tesler, Glenn; Akbari, Ali; Zakov, Shay; Rosenberg, Noah A; Bafna, Vineet.

PLoS Genet ; 12(11): e1006472, 2016 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-27902698

RESUMO

[This corrects the article DOI: 10.1371/journal.pgen.1005527.].

Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen, Roy; Tesler, Glenn; Akbari, Ali; Zakov, Shay; Rosenberg, Noah A; Bafna, Vineet.

PLoS Genet ; 11(9): e1005527, 2015 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-26402243

RESUMO

Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

Assuntos

Alelos , Triagem de Portadores Genéticos , Seleção Genética , Algoritmos , Haplótipos , Humanos , Modelos Teóricos

An algorithmic approach for breakage-fusion-bridge detection in tumor genomes.

Zakov, Shay; Kinsella, Marcus; Bafna, Vineet.

Proc Natl Acad Sci U S A ; 110(14): 5546-51, 2013 Apr 02.

Artigo em Inglês | MEDLINE | ID: mdl-23503850

RESUMO

Breakage-fusion-bridge (BFB) is a mechanism of genomic instability characterized by the joining and subsequent tearing apart of sister chromatids. When this process is repeated during multiple rounds of cell division, it leads to patterns of copy number increases of chromosomal segments as well as fold-back inversions where duplicated segments are arranged head-to-head. These structural variations can then drive tumorigenesis. BFB can be observed in progress using cytogenetic techniques, but generally BFB must be inferred from data such as microarrays or sequencing collected after BFB has ceased. Making correct inferences from this data is not straightforward, particularly given the complexity of some cancer genomes and BFB's ability to generate a wide range of rearrangement patterns. Here we present algorithms to aid the interpretation of evidence for BFB. We first pose the BFB count-vector problem: given a chromosome segmentation and segment copy numbers, decide whether BFB can yield a chromosome with the given segment counts. We present a linear time algorithm for the problem, in contrast to a previous exponential time algorithm. We then combine this algorithm with fold-back inversions to develop tests for BFB. We show that, contingent on assumptions about cancer genome evolution, count vectors and fold-back inversions are sufficient evidence for detecting BFB. We apply the presented techniques to paired-end sequencing data from pancreatic tumors and confirm a previous finding of BFB as well as identify a chromosomal region likely rearranged by BFB cycles, demonstrating the practicality of our approach.

Assuntos

Algoritmos , Biologia Computacional/métodos , Variações do Número de Cópias de DNA/genética , Instabilidade Genômica/genética , Modelos Genéticos , Neoplasias/genética , Troca de Cromátide Irmã/genética , Simulação por Computador

Evaluating genome architecture of a complex region via generalized bipartite matching.

Lo, Christine; Kim, Sangwoo; Zakov, Shay; Bafna, Vineet.

BMC Bioinformatics ; 14 Suppl 5: S13, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23734567

RESUMO

With the remarkable development in inexpensive sequencing technologies and supporting computational tools, we have the promise of medicine being personalized by knowledge of the individual genome. Current technologies provide high throughput, but short reads. Reconstruction of the donor genome is based either on de novo assembly of the (short) reads, or on mapping donor reads to a standard reference. While such techniques demonstrate high success rates for inferring 'simple' genomic segments, they are confounded by segments with complex duplication patterns, including regions of direct medical relevance, like the HLA and the KIR regions.In this work, we address this problem with a method for assessing the quality of a predicted genome sequence for complex regions of the genome. This method combines two natural types of evidence: sequence similarity of the mapped reads to the predicted donor genome, and distribution of reads across the predicted genome. We define a new scoring function for read-to-genome matchings, which penalizes for sequence dissimilarities and deviations from expected read location distribution, and present an efficient algorithm for finding matchings that minimize the penalty. The algorithm is based on a formal problem, first defined in this paper, called Coverage Sensitive many-to-many min-cost bipartite Matching (CSM). This new problem variant generalizes the standard (one-to-one) weighted bipartite matching problem, and can be solved using network flows. The resulting Java-based tool, called SAGE (Scoring function for Assembled GEnomes), is freely available upon request. We demonstrate over simulated data that SAGE can be used to infer correct haplotypes of the highly repetitive KIR region on the Human chromosome 19.

Assuntos

Algoritmos , Genômica/métodos , Mapeamento Cromossômico , Diploide , Genoma Humano , Haploidia , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Receptores KIR/genética , Análise de Sequência de DNA , Software

Reconstructing breakage fusion bridge architectures using noisy copy numbers.

Zakov, Shay; Bafna, Vineet.

J Comput Biol ; 22(6): 577-94, 2015 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-26020441

RESUMO

The Breakage Fusion Bridge (BFB) process is a key marker for genomic instability, producing highly rearranged genomes in relatively small numbers of cell cycles. While the process itself was observed during the late 1930s, little is known about the extent of BFB in tumor genome evolution. Moreover, BFB can dramatically increase copy numbers of chromosomal segments, which in turn hardens the tasks of both reference-assisted and ab initio genome assembly. Based on available data such as Next Generation Sequencing (NGS) and Array Comparative Genomic Hybridization (aCGH) data, we show here how BFB evidence may be identified, and how to enumerate all possible evolutions of the process with respect to observed data. Specifically, we describe practical algorithms that, given a chromosomal arm segmentation and noisy segment copy number estimates, produce all segment count vectors supported by the data that can be produced by BFB, and all corresponding BFB architectures. This extends the scope of analyses described in our previous work, which produced a single count vector and architecture per instance. We apply these analyses to a comprehensive human cancer dataset, demonstrate the effectiveness and efficiency of the computation, and suggest methods for further assertions of candidate BFB samples. Source code of our tool can be found online.

Assuntos

Cromossomos/genética , Dosagem de Genes/genética , Genoma Humano/genética , Instabilidade Genômica/genética , Algoritmos , Hibridização Genômica Comparativa , Amplificação de Genes/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Neoplasias/genética

Efficient edit distance with duplications and contractions.

Pinhas, Tamar; Zakov, Shay; Tsur, Dekel; Ziv-Ukelson, Michal.

Algorithms Mol Biol ; 8(1): 27, 2013 Oct 29.

Artigo em Inglês | MEDLINE | ID: mdl-24168705

RESUMO

: We propose three algorithms for string edit distance with duplications and contractions. These include an efficient general algorithm and two improvements which apply under certain constraints on the cost function. The new algorithms solve a more general problem variant and obtain better time complexities with respect to previous algorithms. Our general algorithm is based on min-plus multiplication of square matrices and has time and space complexities of O (|Σ|MP (n)) and O (|Σ|n2), respectively, where |Σ| is the alphabet size, n is the length of the strings, and MP (n) is the time bound for the computation of min-plus matrix multiplication of two n × n matrices (currently, MP(n)=On3log3lognlog2n due to an algorithm by Chan).For integer cost functions, the running time is further improved to O|Σ|n3log2n. In addition, this variant of the algorithm is online, in the sense that the input strings may be given letter by letter, and its time complexity bounds the processing time of the first n given letters. This acceleration is based on our efficient matrix-vector min-plus multiplication algorithm, intended for matrices and vectors for which differences between adjacent entries are from a finite integer interval D. Choosing a constant 1log|D|n<λ<1, the algorithm preprocesses an n × n matrix in On2+λ|D| time and On2+λ|D|λ2log|D|2n space. Then, it may multiply the matrix with any given n-length vector in On2λ2log|D|2n time. Under some discreteness assumptions, this matrix-vector min-plus multiplication algorithm applies to several problems from the domains of context-free grammar parsing and RNA folding and, in particular, implies the asymptotically fastest On3log2n time algorithm for single-strand RNA folding with discrete cost functions.Finally, assuming a different constraint on the cost function, we present another version of the algorithm that exploits the run-length encoding of the strings and runs in O|Σ|nMP(ñ)ñ time and O(|Σ|nñ) space, where ñ is the length of the run-length encoding of the strings.

Unrooted unordered homeomorphic subtree alignment of RNA trees.

Milo, Nimrod; Zakov, Shay; Katzenelson, Erez; Bachmat, Eitan; Dinitz, Yefim; Ziv-Ukelson, Michal.

Algorithms Mol Biol ; 8(1): 13, 2013 Apr 16.

Artigo em Inglês | MEDLINE | ID: mdl-23590940

RESUMO

: We generalize some current approaches for RNA tree alignment, which are traditionally confined to ordered rooted mappings, to also consider unordered unrooted mappings. We define the Homeomorphic Subtree Alignment problem (HSA), and present a new algorithm which applies to several modes, combining global or local, ordered or unordered, and rooted or unrooted tree alignments. Our algorithm generalizes previous algorithms that either solved the problem in an asymmetric manner, or were restricted to the rooted and/or ordered cases. Focusing here on the most general unrooted unordered case, we show that for input trees T and S, our algorithm has an O(nTnS + min(dT,dS)LTLS) time complexity, where nT,LT and dT are the number of nodes, the number of leaves, and the maximum node degree in T, respectively (satisfying dT ≤ LT ≤ nT), and similarly for nS,LS and dS with respect to the tree S. This improves the time complexity of previous algorithms for less general variants of the problem.In order to obtain this time bound for HSA, we developed new algorithms for a generalized variant of the Min-Cost Bipartite Matching problem (MCM), as well as to two derivatives of this problem, entitled All-Cavity-MCM and All-Pairs-Cavity-MCM. For two input sets of size n and m, where n ≤ m, MCM and both its cavity derivatives are solved in O(n3 + nm) time, without the usage of priority queues (e.g. Fibonacci heaps) or other complex data structures. This gives the first cubic time algorithm for All-Pairs-Cavity-MCM, and improves the running times of MCM and All-Cavity-MCM problems in the unbalanced case where n âª m.We implemented the algorithm (in all modes mentioned above) as a graphical software tool which computes and displays similarities between secondary structures of RNA given as input, and employed it to a preliminary experiment in which we ran all-against-all inter-family pairwise alignments of RNAse P and Hammerhead RNA family members, exposing new similarities which could not be detected by the traditional rooted ordered alignment approaches. The results demonstrate that our approach can be used to expose structural similarity between some RNAs with higher sensitivity than the traditional rooted ordered alignment approaches. Source code and web-interface for our tool can be found in http://www.cs.bgu.ac.il/\~negevcb/FRUUT.

Reducing the worst case running times of a family of RNA and CFG problems, using Valiant's approach.

Zakov, Shay; Tsur, Dekel; Ziv-Ukelson, Michal.

Algorithms Mol Biol ; 6(1): 20, 2011 Aug 18.

Artigo em Inglês | MEDLINE | ID: mdl-21851589

RESUMO

BACKGROUND: RNA secondary structure prediction is a mainstream bioinformatic domain, and is key to computational analysis of functional RNA. In more than 30 years, much research has been devoted to defining different variants of RNA structure prediction problems, and to developing techniques for improving prediction quality. Nevertheless, most of the algorithms in this field follow a similar dynamic programming approach as that presented by Nussinov and Jacobson in the late 70's, which typically yields cubic worst case running time algorithms. Recently, some algorithmic approaches were applied to improve the complexity of these algorithms, motivated by new discoveries in the RNA domain and by the need to efficiently analyze the increasing amount of accumulated genome-wide data. RESULTS: We study Valiant's classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant's approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant's technique and can be applied to all problems which abide by these templates, including many problems within the world of RNA Secondary Structures and Context Free Grammars. CONCLUSIONS: The algorithms presented in this paper improve the theoretical asymptotic worst case running time bounds for a large family of important problems. It is also possible that the suggested techniques could be applied to yield a practical speedup for these problems. For some of the problems (such as computing the RNA partition function and base-pair binding probabilities), the presented techniques are the only ones which are currently known for reducing the asymptotic running time bounds of the standard algorithms.

Rich parameterization improves RNA structure prediction.

Zakov, Shay; Goldberg, Yoav; Elhadad, Michael; Ziv-Ukelson, Michal.

J Comput Biol ; 18(11): 1525-42, 2011 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-22035327

RESUMO

Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally measured thermodynamic parameters, to machine-learning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constant. We study the potential contribution of increasing the amount of information utilized by RNA folding prediction models to the improvement of their prediction quality. This is achieved by proposing novel models, which refine previous ones by examining more types of structural elements, and larger sequential contexts for these elements. Our proposed fine-grained models are made practical thanks to the availability of large training sets, advances in machine-learning, and recent accelerations to RNA folding algorithms. We show that the application of more detailed models indeed improves prediction quality, while the corresponding running time of the folding algorithm remains fast. An additional important outcome of this experiment is a new RNA folding prediction model (coupled with a freely available implementation), which results in a significantly higher prediction quality than that of previous models. This final model has about 70,000 free parameters, several orders of magnitude more than previous models. Being trained and tested over the same comprehensive data sets, our model achieves a score of 84% according to the F1-measure over correctly-predicted base-pairs (i.e., 16% error rate), compared to the previously best reported score of 70% (i.e., 30% error rate). That is, the new model yields an error reduction of about 50%. Trained models and source code are available at www.cs.bgu.ac.il/?negevcb/contextfold.

Assuntos

Algoritmos , Modelos Moleculares , RNA/química , Inteligência Artificial , Pareamento de Bases , Sequência de Bases , Simulação por Computador , Conformação de Ácido Nucleico

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA