Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Discrete Appl Math ; 337: 81-105, 2023 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-37213330

RESUMO

For unweighted graphs, finding isometric embeddings of a graph G is closely related to decompositions of G into Cartesian products of smaller graphs. When G is isomorphic to a Cartesian graph product, we call the factors of this product a factorization of G. When G is isomorphic to an isometric subgraph of a Cartesian graph product, we call those factors a pseudofactorization of G. Prior work has shown that an unweighted graph's pseudofactorization can be used to generate a canonical isometric embedding into a product of the smallest possible pseudofactors. However, for arbitrary weighted graphs, which represent a richer variety of metric spaces, methods for finding isometric embeddings or determining their existence remain elusive, and indeed pseudofactorization and factorization have not previously been extended to this context. In this work, we address the problem of finding the factorization and pseudofactorization of a weighted graph G, where G satisfies the property that every edge constitutes a shortest path between its endpoints. We term such graphs minimal graphs, noting that every graph can be made minimal by removing edges not affecting its path metric. We generalize pseudofactorization and factorization to minimal graphs and develop new proof techniques that extend the previously proposed algorithms due to Graham and Winkler [Graham and Winkler, '85] and Feder [Feder, '92] for pseudofactorization and factorization of unweighted graphs. We show that any n-vertex, m-edge graph with positive integer edge weights can be factored in O(m2) time, plus the time to find all pairs shortest paths (APSP) distances in a weighted graph, resulting in an overall running time of O(m2+n2 log log n) time. We also show that a pseudofactorization for such a graph can be computed in O(mn) time, plus the time to solve APSP, resulting in an O(mn+n2 log log n) running time.

2.
Discrete Appl Math ; 332: 119-128, 2023 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-37982050

RESUMO

A mapping α:V(G)→V(H) from the vertex set of one graph G to another graph H is an isometric embedding if the shortest path distance between any two vertices in G equals the distance between their images in H. Here, we consider isometric embeddings of a weighted graph G into unweighted Hamming graphs, called Hamming embeddings, when G satisfies the property that every edge is a shortest path between its endpoints. Using a Cartesian product decomposition of G called its canonical isometric representation, we show that every Hamming embedding of G may be partitioned into a canonical partition, whose parts provide Hamming embeddings for each factor of the canonical isometric representation of G. This implies that G permits a Hamming embedding if and only if each factor of its canonical isometric representation is Hamming embeddable. This result extends prior work on unweighted graphs that showed that an unweighted graph permits a Hamming embedding if and only if each factor is a complete graph. When a graph G has nontrivial isometric representation, determining whether G has a Hamming embedding can be simplified to checking embeddability of two or more smaller graphs.

3.
Bioinformatics ; 32(17): 2567-76, 2016 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27153661

RESUMO

MOTIVATION: Many biological data processing problems can be formalized as clustering problems to partition data points into sensible and biologically interpretable groups. RESULTS: This article introduces densityCut, a novel density-based clustering algorithm, which is both time- and space-efficient and proceeds as follows: densityCut first roughly estimates the densities of data points from a K-nearest neighbour graph and then refines the densities via a random walk. A cluster consists of points falling into the basin of attraction of an estimated mode of the underlining density function. A post-processing step merges clusters and generates a hierarchical cluster tree. The number of clusters is selected from the most stable clustering in the hierarchical cluster tree. Experimental results on ten synthetic benchmark datasets and two microarray gene expression datasets demonstrate that densityCut performs better than state-of-the-art algorithms for clustering biological datasets. For applications, we focus on the recent cancer mutation clustering and single cell data analyses, namely to cluster variant allele frequencies of somatic mutations to reveal clonal architectures of individual tumours, to cluster single-cell gene expression data to uncover cell population compositions, and to cluster single-cell mass cytometry data to detect communities of cells of the same functional states or types. densityCut performs better than competing algorithms and is scalable to large datasets. AVAILABILITY AND IMPLEMENTATION: Data and the densityCut R package is available from https://bitbucket.org/jerry00/densitycut_dev CONTACT: : condon@cs.ubc.ca or sshah@bccrc.ca or jiaruid@cs.ubc.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica , Expressão Gênica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos
4.
Nat Comput ; 16(2): 261-284, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28690474

RESUMO

A major goal of natural computing is to design biomolecules, such as nucleic acid sequences, that can be used to perform computations. We design sequences of nucleic acids that are "guaranteed" to have long folding pathways relative to their length. This particular sequences with high probability follow low-barrier folding pathways that visit a large number of distinct structures. Long folding pathways are interesting, because they demonstrate that natural computing can potentially support long and complex computations. Formally, we provide the first scalable designs of molecules whose low-barrier folding pathways, with respect to a simple, stacked pair energy model, grow superlinearly with the molecule length, but for which all significantly shorter alternative folding pathways have an energy barrier that is [Formula: see text] times that of the low-barrier pathway for any [Formula: see text] and a sufficiently long sequence.

6.
BMC Bioinformatics ; 15: 147, 2014 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-24884954

RESUMO

BACKGROUND: Improving accuracy and efficiency of computational methods that predict pseudoknotted RNA secondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be very slow and are limited in the types of pseudoknots that they can predict. Incorporating known structural information can improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structures that can incorporate structural information as input. There is even less understanding of the relative robustness of these methods with respect to partial information. RESULTS: We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. Iterative HFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy is at least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFold leverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on the hierarchical folding hypothesis, and the energy parameters of HotKnots V2.0.Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partial information, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as the user provides up to 40% of the input structure.Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also has significantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets. CONCLUSIONS: Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whose accuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and with about 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while being significantly faster. Iterative HFold and all data used in this work are freely available at http://www.cs.ubc.ca/~hjabbari/software.php.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Conformação de Ácido Nucleico , RNA/química , Análise de Sequência de RNA/métodos , Sequência de Bases , Humanos , RNA/genética , Software , Fatores de Tempo
7.
Nat Comput ; 13(4): 499-516, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25400535

RESUMO

Chemical reaction networks (CRNs) and DNA strand displacement systems (DSDs) are widely-studied and useful models of molecular programming. However, in order for some DSDs in the literature to behave in an expected manner, the initial number of copies of some reagents is required to be fixed. In this paper we show that, when multiple copies of all initial molecules are present, general types of CRNs and DSDs fail to work correctly if the length of the shortest sequence of reactions needed to produce any given molecule exceeds a threshold that grows polynomially with attributes of the system.

8.
Bioinformatics ; 28(2): 167-75, 2012 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-22084253

RESUMO

MOTIVATION: The study of cancer genomes now routinely involves using next-generation sequencing technology (NGS) to profile tumours for single nucleotide variant (SNV) somatic mutations. However, surprisingly few published bioinformatics methods exist for the specific purpose of identifying somatic mutations from NGS data and existing tools are often inaccurate, yielding intolerably high false prediction rates. As such, the computational problem of accurately inferring somatic mutations from paired tumour/normal NGS data remains an unsolved challenge. RESULTS: We present the comparison of four standard supervised machine learning algorithms for the purpose of somatic SNV prediction in tumour/normal NGS experiments. To evaluate these approaches (random forest, Bayesian additive regression tree, support vector machine and logistic regression), we constructed 106 features representing 3369 candidate somatic SNVs from 48 breast cancer genomes, originally predicted with naive methods and subsequently revalidated to establish ground truth labels. We trained the classifiers on this data (consisting of 1015 true somatic mutations and 2354 non-somatic mutation positions) and conducted a rigorous evaluation of these methods using a cross-validation framework and hold-out test NGS data from both exome capture and whole genome shotgun platforms. All learning algorithms employing predictive discriminative approaches with feature selection improved the predictive accuracy over standard approaches by statistically significant margins. In addition, using unsupervised clustering of the ground truth 'false positive' predictions, we noted several distinct classes and present evidence suggesting non-overlapping sources of technical artefacts illuminating important directions for future study. AVAILABILITY: Software called MutationSeq and datasets are available from http://compbio.bccrc.ca.


Assuntos
Algoritmos , Inteligência Artificial , Neoplasias da Mama/genética , Mutação , Polimorfismo de Nucleotídeo Único , Teorema de Bayes , Análise por Conglomerados , Exoma , Feminino , Genoma , Humanos , Modelos Genéticos , Neoplasias , Software , Máquina de Vetores de Suporte
9.
Comput Biol Chem ; 104: 107837, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-36858009

RESUMO

Predicting the kinetics of reactions involving nucleic acid strands is a fundamental task in biology and biotechnology. Reaction kinetics can be modeled as an elementary step continuous-time Markov chain, where states correspond to secondary structures and transitions correspond to base pair formation and breakage. Since the number of states in the Markov chain could be large, rates are determined by estimating the mean first passage time from sampled trajectories. As a result, the cost of kinetic predictions becomes prohibitively expensive for rare events with extremely long trajectories. Also problematic are scenarios where multiple predictions are needed for the same reaction, e.g., under different environmental conditions, or when calibrating model parameters, because a new set of trajectories is needed multiple times. We propose a new method, called pathway elaboration, to handle these scenarios. Pathway elaboration builds a truncated continuous-time Markov chain through both biased and unbiased sampling. The resulting Markov chain has moderate state space size, so matrix methods can efficiently compute reaction rates, even for rare events. Also the transition rates of the truncated Markov chain can easily be adapted when model or environmental parameters are perturbed, making model calibration feasible. We illustrate the utility of pathway elaboration on toehold-mediated strand displacement reactions, show that it well-approximates trajectory-based predictions of unbiased elementary step models on a wide range of reaction types for which such predictions are feasible, and demonstrate that it performs better than alternative truncation-based approaches that are applicable for mean first passage time estimation. Finally, in a small study, we use pathway elaboration to optimize the Metropolis kinetic model of Multistrand, an elementary step simulator, showing that the optimized parameters greatly improve reaction rate predictions. Our framework and dataset are available at https://github.com/DNA-and-Natural-Algorithms-Group/PathwayElaboration.


Assuntos
Algoritmos , DNA , Cadeias de Markov , Cinética , Pareamento de Bases
10.
IEEE/ACM Trans Comput Biol Bioinform ; 20(6): 3842-3850, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37889827

RESUMO

Aligning electron density maps from Cryogenic electron microscopy (cryo-EM) is a first key step for studying multiple conformations of a biomolecule. As this step remains costly and challenging, with standard alignment tools being potentially stuck in local minima, we propose here a new procedure, called AlignOT, which relies on the use of computational optimal transport (OT) to align EM maps in 3D space. By embedding a fast estimation of OT maps within a stochastic gradient descent algorithm, our method searches for a rotation that minimizes the Wasserstein distance between two maps, represented as point clouds. We quantify the impact of various parameters on the precision and accuracy of the alignment, and show that AlignOT can outperform the standard local alignment methods, with an increased range of rotation angles leading to proper alignment. We further benchmark AlignOT on various pairs of experimental maps, which account for different types of conformational heterogeneities and geometric properties. As our experiments show good performance, we anticipate that our method can be broadly applied to align 3D EM maps.


Assuntos
Algoritmos , Microscopia Crioeletrônica/métodos , Conformação Proteica
11.
BMC Bioinformatics ; 13: 22, 2012 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-22296803

RESUMO

BACKGROUND: RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. RESULTS: We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). CONCLUSIONS: Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.


Assuntos
Algoritmos , RNA/química , Conformação de Ácido Nucleico , Valor Preditivo dos Testes , Ribonuclease P/química , Termodinâmica
12.
RNA ; 16(1): 26-42, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19933322

RESUMO

Accurate prediction of RNA pseudoknotted secondary structures from the base sequence is a challenging computational problem. Since prediction algorithms rely on thermodynamic energy models to identify low-energy structures, prediction accuracy relies in large part on the quality of free energy change parameters. In this work, we use our earlier constraint generation and Boltzmann likelihood parameter estimation methods to obtain new energy parameters for two energy models for secondary structures with pseudoknots, namely, the Dirks-Pierce (DP) and the Cao-Chen (CC) models. To train our parameters, and also to test their accuracy, we create a large data set of both pseudoknotted and pseudoknot-free secondary structures. In addition to structural data our training data set also includes thermodynamic data, for which experimentally determined free energy changes are available for sequences and their reference structures. When incorporated into the HotKnots prediction algorithm, our new parameters result in significantly improved secondary structure prediction on our test data set. Specifically, the prediction accuracy when using our new parameters improves from 68% to 79% for the DP model, and from 70% to 77% for the CC model.


Assuntos
Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , Algoritmos , Sequência de Bases , Metabolismo Energético/fisiologia , Previsões/métodos , Modelos Genéticos , Dados de Sequência Molecular , RNA/análise , Sensibilidade e Especificidade , Análise de Sequência de RNA , Software , Termodinâmica
13.
RNA ; 16(12): 2304-18, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20940338

RESUMO

Methods for efficient and accurate prediction of RNA structure are increasingly valuable, given the current rapid advances in understanding the diverse functions of RNA molecules in the cell. To enhance the accuracy of secondary structure predictions, we developed and refined optimization techniques for the estimation of energy parameters. We build on two previous approaches to RNA free-energy parameter estimation: (1) the Constraint Generation (CG) method, which iteratively generates constraints that enforce known structures to have energies lower than other structures for the same molecule; and (2) the Boltzmann Likelihood (BL) method, which infers a set of RNA free-energy parameters that maximize the conditional likelihood of a set of reference RNA structures. Here, we extend these approaches in two main ways: We propose (1) a max-margin extension of CG, and (2) a novel linear Gaussian Bayesian network that models feature relationships, which effectively makes use of sparse data by sharing statistical strength between parameters. We obtain significant improvements in the accuracy of RNA minimum free-energy pseudoknot-free secondary structure prediction when measured on a comprehensive set of 2518 RNA molecules with reference structures. Our parameters can be used in conjunction with software that predicts RNA secondary structures, RNA hybridization, or ensembles of structures. Our data, software, results, and parameter sets in various formats are freely available at http://www.cs.ubc.ca/labs/beta/Projects/RNA-Params.


Assuntos
Biologia Computacional/métodos , Metabolismo Energético/fisiologia , RNA/química , RNA/metabolismo , Estatística como Assunto/métodos , Algoritmos , Animais , Composição de Bases , Sequência de Bases , Biologia Computacional/estatística & dados numéricos , Humanos , Modelos Teóricos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de RNA
15.
Bull Environ Contam Toxicol ; 86(2): 159-62, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21132490

RESUMO

It is unclear whether mercury concentration in wildlife tissues changes appreciably after lengthy frozen storage. To test whether such freezer-archived samples are stable, small (~10-50 µL) avian blood samples stored in capped glass capillary tubes were analyzed for total mercury concentration, and then reanalyzed after being frozen for up to 3 years. Mercury concentrations increased 6% on average over the 3 year period, but time spent frozen explained only 11% of the variation between measurements. This small amount of change suggests that archived blood samples remain useful for at least several years.


Assuntos
Aves/sangue , Monitoramento Ambiental , Poluentes Ambientais/sangue , Mercúrio/sangue , Animais , Preservação de Sangue
16.
Environ Toxicol Chem ; 28(2): 395-401, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18937528

RESUMO

Dynamics of mercury in feathers and blood of free-living songbirds is poorly understood. Nestling eastern bluebirds (Sialia sialis) living along the mercury-contaminated South River (Virginia, USA) had blood mercury levels an order of magnitude lower than their parents (nestling: 0.09 +/- 0.06 mg/kg [mean +/- standard deviation], n = 156; adult: 1.21 +/- 0.57 mg/kg, n = 86). To test whether this low blood mercury was the result of mercury sequestration in rapidly growing feathers, we repeatedly sampled free-living juveniles throughout the period of feather growth and molt. Mean blood mercury concentrations increased to 0.52 +/- 0.36 mg/kg (n = 44) after the completion of feather growth. Some individuals had reached adult blood mercury levels within three months of leaving the nest, but levels dropped to 0.20 +/- 0.09 mg/kg (n = 11) once the autumn molt had begun. Most studies of mercury contamination in juvenile birds have focused on recently hatched young with thousands of rapidly growing feathers. However, the highest risk period for mercury intoxication in young birds may be during the vulnerable period after fledging, when feathers no longer serve as a buffer against dietary mercury. We found that nestling blood mercury levels were not indicative of the extent of contamination because a large portion of the ingested mercury ended up in feathers. The present study demonstrates unequivocally that in songbirds blood mercury level is influenced strongly by the growth and molt of feathers.


Assuntos
Plumas/crescimento & desenvolvimento , Mercúrio/sangue , Aves Canoras/sangue , Animais , Comportamento Predatório , Estações do Ano
17.
BMC Bioinformatics ; 9: 340, 2008 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-18700982

RESUMO

BACKGROUND: The ability to access, search and analyse secondary structures of a large set of known RNA molecules is very important for deriving improved RNA energy models, for evaluating computational predictions of RNA secondary structures and for a better understanding of RNA folding. Currently there is no database that can easily provide these capabilities for almost all RNA molecules with known secondary structures. RESULTS: In this paper we describe RNA STRAND - the RNA secondary STRucture and statistical ANalysis Database, a curated database containing known secondary structures of any type and organism. Our new database provides a wide collection of known RNA secondary structures drawn from public databases, searchable and downloadable in a common format. Comprehensive statistical information on the secondary structures in our database is provided using the RNA Secondary Structure Analyser, a new tool we have developed to analyse RNA secondary structures. The information thus obtained is valuable for understanding to which extent and with which probability certain structural motifs can appear. We outline several ways in which the data provided in RNA STRAND can facilitate research on RNA structure, including the improvement of RNA energy models and evaluation of secondary structure prediction programs. In order to keep up-to-date with new RNA secondary structure experiments, we offer the necessary tools to add solved RNA secondary structures to our database and invite researchers to contribute to RNA STRAND. CONCLUSION: RNA STRAND is a carefully assembled database of trusted RNA secondary structures, with easy on-line tools for searching, analyzing and downloading user selected entries, and is publicly available at http://www.rnasoft.ca/strand.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Modelos Químicos , Modelos Moleculares , RNA/química , RNA/ultraestrutura , Interface Usuário-Computador , Gráficos por Computador , Simulação por Computador , Armazenamento e Recuperação da Informação/métodos , Conformação de Ácido Nucleico
18.
J Comput Biol ; 15(2): 139-63, 2008 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-18312147

RESUMO

Algorithms for prediction of RNA secondary structure-the set of base pairs that form when an RNA molecule folds-are valuable to biologists who aim to understand RNA structure and function. Improving the accuracy and efficiency of prediction methods is an ongoing challenge, particularly for pseudoknotted secondary structures, in which base pairs overlap. This challenge is biologically important, since pseudoknotted structures play essential roles in functions of many RNA molecules, such as splicing and ribosomal frameshifting. State-of-the-art methods, which are based on free energy minimization, have high run-time complexity (typically Theta(n(5)) or worse), and can handle (minimize over) only limited types of pseudoknotted structures. We propose a new approach for prediction of pseudoknotted structures, motivated by the hypothesis that RNA structures fold hierarchically, with pseudoknot-free (non-overlapping) base pairs forming first, and pseudoknots forming later so as to minimize energy relative to the folded pseudoknot-free structure. Our HFold algorithm uses two-phase energy minimization to predict hierarchically formed secondary structures in O(n(3)) time, matching the complexity of the best algorithms for pseudoknot-free secondary structure prediction via energy minimization. Our algorithm can handle a wide range of biological structures, including kissing hairpins and nested kissing hairpins, which have previously required Theta(n(6)) time.


Assuntos
Algoritmos , Conformação de Ácido Nucleico , RNA/química , Análise de Sequência de RNA , Pareamento de Bases , Sequência de Bases , Biologia Computacional , Matemática , Dados de Sequência Molecular , Software
19.
Bioinformatics ; 23(13): i19-28, 2007 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-17646296

RESUMO

MOTIVATION: Accurate prediction of RNA secondary structure from the base sequence is an unsolved computational challenge. The accuracy of predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The most widely used model, the Turner99 model, has hundreds of parameters, and so a robust parameter estimation scheme should efficiently handle large data sets with thousands of structures. Moreover, the estimation scheme should also be trained using available experimental free energy data in addition to structural data. RESULTS: In this work, we present constraint generation (CG), the first computational approach to RNA free energy parameter estimation that can be efficiently trained on large sets of structural as well as thermodynamic data. Our CG approach employs a novel iterative scheme, whereby the energy values are first computed as the solution to a constrained optimization problem. Then the newly computed energy parameters are used to update the constraints on the optimization function, so as to better optimize the energy parameters in the next iteration. Using our method on biologically sound data, we obtain revised parameters for the Turner99 energy model. We show that by using our new parameters, we obtain significant improvements in prediction accuracy over current state of-the-art methods. AVAILABILITY: Our CG implementation is available at http://www.rnasoft.ca/CG/.


Assuntos
Algoritmos , Modelos Químicos , Modelos Moleculares , RNA/química , RNA/ultraestrutura , Análise de Sequência de RNA/métodos , Sequência de Bases , Simulação por Computador , Dados de Sequência Molecular , Conformação de Ácido Nucleico
20.
Am J Med Genet B Neuropsychiatr Genet ; 147B(6): 880-9, 2008 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-18205168

RESUMO

Nuclear receptor 2E1 gene (NR2E1) resides within a 6q21-22 locus for bipolar disorder and schizophrenia. Mice deleted for Nr2e1 show altered neurogenesis, cortical and limbic abnormalities, aggression, hyperexcitability, and cognitive impairment. NR2E1 is therefore a positional and functional candidate for involvement in mental illness. We performed association analyses in 394 patients with bipolar disorder, 396 with schizophrenia, and 479 controls using six common markers and haplotypes. We also performed a comprehensive mutation screen of NR2E1, resequencing its entire coding region, complete 5' and 3' untranslated regions, consensus splice-sites, and evolutionarily conserved regions in 126 humans with bipolar disorder, schizophrenia, or aggressive disorders. NR2E1 was associated with bipolar disorder I and II [odds ratio (OR = 0.77, P = 0.013), bipolar disorder I (OR = 0.77, P = 0.015), bipolar disorder in females (OR = 0.72, P = 0.009), and with age at onset < or = 25 years (OR = 0.67, P = 0.006)], all of which remained significant after correcting for multiple comparisons. We identified eight novel candidate mutations that were absent in 325 controls; four of these were predicted to alter known neural transcription factor binding sites. Analyses of NR2E1 mRNA in human brain revealed forebrain-specific transcription. The data presented support the hypothesis that genetic variation at NR2E1 may be associated with susceptibility to brain-behavior disorders.


Assuntos
Agressão/fisiologia , Transtorno Bipolar/genética , Análise Mutacional de DNA/métodos , Ligação Genética , Receptores Citoplasmáticos e Nucleares/genética , Esquizofrenia/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Estudos de Casos e Controles , Feminino , Frequência do Gene , Predisposição Genética para Doença , Humanos , Masculino , Repetições de Microssatélites , Pessoa de Meia-Idade , Mutação/fisiologia , Receptores Nucleares Órfãos , Polimorfismo de Nucleotídeo Único
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA