Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Bioinformatics ; 26(12): i175-82, 2010 Jun 15.
Article in English | MEDLINE | ID: mdl-20529903

ABSTRACT

MOTIVATION: Association analysis is the method of choice for studying complex multifactorial diseases. The premise of this method is that affected persons contain some common genomic regions with similar SNP alleles and such areas will be found in this analysis. An important disadvantage of GWA studies is that it does not distinguish between genomic areas that are inherited from a common ancestor [identical by descent (IBD)] and areas that are identical merely by state [identical by state (IBS)]. Clearly, areas that can be marked with higher probability as IBD and have the same correlation with the disease status of identical areas that are more probably only IBS, are better candidates to be causative, and yet this distinction is not encoded in standard association analysis. RESULTS: We develop a factorial hidden Markov model-based algorithm for computing genome-wide IBD sharing. The algorithm accepts as input SNP data of measured individuals and estimates the probability of IBD at each locus for every pair of individuals. For two g-degree relatives, when g > or = 8, the computation yields a precision of IBD tagging of over 50% higher than previous methods for 95% recall. Our algorithm uses a first-order Markovian model for the linkage disequilibrium process and employs a reduction of the state space of the inheritance vector from being exponential in g to quadratic. The higher accuracy along with the reduced time complexity marks our method as a feasible means for IBD mapping in practical scenarios. AVAILABILITY: A software implementation, called IBDMAP, is freely available at http://bioinfo.cs.technion.ac.il/IBDmap.


Subject(s)
Chromosome Mapping/methods , Genome , Genomics/methods , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Algorithms , Genome-Wide Association Study , Markov Chains , Models, Genetic
2.
RNA ; 14(7): 1352-65, 2008 Jul.
Article in English | MEDLINE | ID: mdl-18492794

ABSTRACT

Cotranslational synthesis of proteins into the endoplasmic reticulum is preceded by targeting of the translating mRNA once a signal peptide emerges from the ribosome exit tunnel. Many mRNAs, however, are unlikely to be targeted by this process because they encode proteins that do not contain a signal peptide or because they are too short to be recognized by the signal recognition particle. Herein we tested the possible involvement of the 3'-UTR in the localization of an mRNA that encodes a very short Saccharomyces cerevisiae protein (Pmp1). We found by ribosome density mapping, sedimentation analysis, differential centrifugation, and fluorescent in situ hybridization that the 3'-UTR is essential for the association of the transcript with membrane compartments. Fusion of the 3'-UTR to heterologous open reading frames conferred on them a sedimentation and cellular localization pattern resembling that of PMP1. Mutation analysis revealed that a repeating UG-rich sequence within the 3'-UTR is important for membrane association. Taken together, our results reveal an essential role for elements within the 3'-UTR in the localization of an mRNA that is likely to be ignored by the standard signal-dependant mechanism.


Subject(s)
3' Untranslated Regions/metabolism , Membrane Proteins/genetics , Membrane Proteins/metabolism , Nerve Tissue Proteins/genetics , Nerve Tissue Proteins/metabolism , Proteolipids/genetics , Proteolipids/metabolism , RNA, Fungal/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , 3' Untranslated Regions/analysis , 3' Untranslated Regions/genetics , Codon, Terminator , Endoplasmic Reticulum/metabolism , Membrane Proteins/analysis , Mutagenesis , Nerve Tissue Proteins/analysis , Polyribosomes , Protein Transport , Proteolipids/analysis , Proton-Translocating ATPases , RNA, Fungal/analysis , RNA, Fungal/genetics , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/analysis
3.
Bioinformatics ; 25(12): i196-203, 2009 Jun 15.
Article in English | MEDLINE | ID: mdl-19477987

ABSTRACT

UNLABELLED: We develop an hidden Markov model (HMM)-based algorithm for computing exact parametric and non-parametric linkage scores in larger pedigrees than was possible before. The algorithm is applicable whenever there are chains of persons in the pedigree with no genetic measurements and with unknown affection status. The algorithm is based on shrinking the state space of the HMM considerably using such chains. In a two g-degree cousins pedigree the reduction drops the state space from being exponential in g to being linear in g. For a Finnish family in which two affected children suffer from a rare cold-inducing sweating syndrome, we were able to reduce the state space by more than five orders of magnitude from 2(50) to 2(32). In another pedigree of state-space size of 2(27), used for a study of pituitary adenoma, the state space reduced by a factor of 8.5 and consequently exact linkage scores can now be computed, rather than approximated. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Computational Biology/methods , Genetic Linkage/genetics , Markov Chains , Humans , Phenotype , Regression Analysis
4.
BMC Bioinformatics ; 10: 76, 2009 Mar 04.
Article in English | MEDLINE | ID: mdl-19257906

ABSTRACT

BACKGROUND: Scanning large genomes with a sliding window in search of locally stable RNA structures is a well motivated problem in bioinformatics. Given a predefined window size L and an RNA sequence S of size N (L < N), the consecutive windows folding problem is to compute the minimal free energy (MFE) for the folding of each of the L-sized substrings of S. The consecutive windows folding problem can be naively solved in O(NL3) by applying any of the classical cubic-time RNA folding algorithms to each of the N-L windows of size L. Recently an O(NL2) solution for this problem has been described. RESULTS: Here, we describe and implement an O(NLpsi(L)) engine for the consecutive windows folding problem, where psi(L) is shown to converge to O(1) under the assumption of a standard probabilistic polymer folding model, yielding an O(L) speedup which is experimentally confirmed. Using this tool, we note an intriguing directionality (5'-3' vs. 3'-5') folding bias, i.e. that the minimal free energy (MFE) of folding is higher in the native direction of the DNA than in the reverse direction of various genomic regions in several organisms including regions of the genomes that do not encode proteins or ncRNA. This bias largely emerges from the genomic dinucleotide bias which affects the MFE, however we see some variations in the folding bias in the different genomic regions when normalized to the dinucleotide bias. We also present results from calculating the MFE landscape of a mouse chromosome 1, characterizing the MFE of the long ncRNA molecules that reside in this chromosome. CONCLUSION: The efficient consecutive windows folding engine described in this paper allows for genome wide scans for ncRNA molecules as well as large-scale statistics. This is implemented here as a software tool, called RNAslider, and applied to the scanning of long chromosomes, leading to the observation of features that are visible only on a large scale.


Subject(s)
Algorithms , Genome , RNA/chemistry , Software , Nucleic Acid Conformation , RNA, Untranslated/chemistry
5.
Bioinformatics ; 23(2): e184-90, 2007 Jan 15.
Article in English | MEDLINE | ID: mdl-17237090

ABSTRACT

UNLABELLED: MOTIVATION AND METHODS: All living organisms and the survival of all cells critically depend on their ability to sense and quickly adapt to changes in the environment and to other stress conditions. We study stress response mechanisms in Saccharomyces cerevisiae by identifying genes that, according to very stringent criteria, have persistent co-expression under a variety of stress conditions. This is enabled through a fast clique search method applied to the intersection of several co-expression graphs calculated over the data of Gasch et al. This method exploits the topological characteristics of these graphs. RESULTS: We observe cliques in the intersection graphs that are much larger than expected under a null model of changing gene identities for different stress conditions but maintaining the co-expression topology within each one. Persistent cliques are analyzed to identify enriched function as well as enriched regulation by a small number of TFs. These TFs, therefore, characterize a universal and persistent reaction to stress response. We further demonstrate that the vertices (genes) of many cliques in the intersection graphs are co-localized in the yeast genome, to a degree far beyond the random expectation. Co-localization can hypothetically contribute to a quick co-ordinated response. We propose the use of persistent cliques in further study of properties of co-regulation.


Subject(s)
Gene Expression Regulation, Fungal/physiology , Gene Expression/physiology , Heat-Shock Response/physiology , Models, Biological , Oxidative Stress/physiology , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Computer Simulation , Heat-Shock Proteins/metabolism , Transcription Factors/metabolism
6.
J Comput Biol ; 14(6): 856-72, 2007.
Article in English | MEDLINE | ID: mdl-17691898

ABSTRACT

mRNA molecules are folded in the cells and therefore many of their substrings may actually be inaccessible to protein and microRNA binding. The need to apply an accessibility criterion to the task of genome-wide mRNA motif discovery raises the challenge of overcoming the core O(n(3)) factor imposed by the time complexity of the currently best known algorithms for RNA secondary structure prediction. We speed up the dynamic programming algorithms that are standard for RNA folding prediction. Our new approach significantly reduces the computations without sacrificing the optimality of the results, yielding an expected time complexity of O(n(2) psi(n)), where psi(n) is shown to be constant on average under standard polymer folding models. A benchmark analysis confirms that in practice the runtime ratio between the previous approach and the new algorithm indeed grows linearly with increasing sequence size. The fast new RNA folding algorithm is utilized for genome-wide discovery of accessible cis-regulatory motifs in data sets of ribosomal densities and decay rates of S. cerevisiae genes and to the mining of exposed binding sites of tissue-specific microRNAs in A. thaliana.


Subject(s)
Algorithms , RNA/chemistry , RNA/genetics , Regulatory Sequences, Ribonucleic Acid , Arabidopsis/genetics , Computational Biology , MicroRNAs/genetics , Models, Molecular , Nucleic Acid Conformation , Ribosomes/physiology , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/genetics , Software
7.
J Comput Biol ; 12(7): 928-42, 2005 Sep.
Article in English | MEDLINE | ID: mdl-16201913

ABSTRACT

An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and its effectiveness on genomic data is demonstrated.


Subject(s)
Computational Biology/methods , Sequence Analysis, DNA/methods , Tandem Repeat Sequences , Algorithms , Computational Biology/statistics & numerical data , Humans , Models, Genetic , Models, Statistical , Sequence Analysis, DNA/statistics & numerical data
8.
J Comput Biol ; 17(8): 1051-65, 2010 Aug.
Article in English | MEDLINE | ID: mdl-20649420

ABSTRACT

The current pairwise RNA (secondary) structural alignment algorithms are based on Sankoff's dynamic programming algorithm from 1985. Sankoff's algorithm requires O(N(6)) time and O(N(4)) space, where N denotes the length of the compared sequences, and thus its applicability is very limited. The current literature offers many heuristics for speeding up Sankoff's alignment process, some making restrictive assumptions on the length or the shape of the RNA substructures. We show how to speed up Sankoff's algorithm in practice via non-heuristic methods, without compromising optimality. Our analysis shows that the expected time complexity of the new algorithm is O(N(4)sigma(N)), where sigma(N) converges to O(N), assuming a standard polymer folding model which was supported by experimental analysis. Hence, our algorithm speeds up Sankoff's algorithm by a linear factor on average. In simulations, our algorithm speeds up computation by a factor of 3-12 for sequences of length 25-250. Code and data sets are available, upon request.


Subject(s)
Algorithms , RNA/chemistry , Sequence Alignment/methods , Animals , Base Sequence , Caenorhabditis elegans/genetics , DNA/chemistry , Nucleic Acid Conformation , Sequence Alignment/economics
9.
J Comput Biol ; 15(7): 721-35, 2008 Sep.
Article in English | MEDLINE | ID: mdl-18652528

ABSTRACT

Probabilistic phylogenetic models which relax the site independence evolution assumption often face the problem of infeasible likelihood computations, for example, for the task of selecting suitable parameters for the model. We present a new approximation method, applicable for a wide range of probabilistic models, which guarantees to upper and lower bound the true likelihood of data, and apply it to the problem of probabilistic phylogenetic models. The new method is complementary to known variational methods that lower bound the likelihood, and it uses similar methods to optimize the bounds from above and below. We applied our method to aligned DNA sequences of various lengths from human in the region of the CFTR gene and homologous from eight mammals, and found the bounds to be appreciably close to the true likelihood whenever it could be computed. When computing the exact likelihood was not feasible, we demonstrated the proximity of the upper and lower variational bounds, implying a tight approximation of the likelihood.


Subject(s)
Algorithms , Models, Statistical , Phylogeny , Sequence Analysis, DNA/methods , Animals , Base Sequence , Computational Biology/methods , Cystic Fibrosis Transmembrane Conductance Regulator/genetics , DNA/genetics , Humans , Image Processing, Computer-Assisted/methods , Mammals/genetics , Mathematics , Models, Genetic
10.
J Theor Biol ; 247(1): 1-10, 2007 Jul 07.
Article in English | MEDLINE | ID: mdl-17416391

ABSTRACT

Genetic recombination is a central and repeated topic of study in the evolution of life. However, along with the influence of recombination on evolution, we understand surprisingly little of how selection shapes the nature of recombination. One explanation for recombination is that it allows organisms to escape from perilous situations where they experience very low fitness. As a corollary, it has been suggested that selection should favor recombination at low fitness and not at high fitness (fitness-associated recombination, FAR), and theory suggests that such strategies can indeed be selected. Here we develop models to further investigate the evolution of FAR. Consistent with previous works, we find that FAR can invade and dominate over a strategy of uniform recombination that is independent of fitness. However, our simulation results suggest that extreme FAR strategies, known as group-elitism, are not necessarily superior to other FAR strategies. Moreover, we argue that FAR domination will often occur with a net loss of mean population fitness. Interestingly, this suggests that the strategy of not recombining at high fitness will sometimes be analogous to a defector strategy from the famous "prisoner's dilemma" game: a selfish strategy that is selected but leads to a loss of mean fitness for all players.


Subject(s)
Game Theory , Models, Genetic , Recombination, Genetic , Animals , Biological Evolution , Computer Simulation
SELECTION OF CITATIONS
SEARCH DETAIL