Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
Curr Protoc Bioinformatics ; 63(1): e54, 2018 09.
Article in English | MEDLINE | ID: mdl-30168910

ABSTRACT

The Basic Local Alignment Search Tool (BLAST) is the first resource to computationally characterize a novel amino acid or nucleic acid sequence. BLAST plays important roles in genomics, transcriptomics, and protein science. For numerous academic and commercial researchers, neither BLAST Web servers nor cloud resources satisfy the requirements of high-throughput comparative genomic pipelines or company policies. For such users, this unit describes how to install BLAST locally, either on a standalone workstation, or preferably on a compute cluster. We provide practical guidance for the planning and the installation under the LINUX, Windows, and Mac OS X operating systems. We propose strategies for downloading existing and generating new sequence databases in BLAST format. © 2018 by John Wiley & Sons, Inc.


Subject(s)
Algorithms , Sequence Alignment/methods , Databases, Nucleic Acid
2.
Curr Protoc Bioinformatics ; 59: 3.4.1-3.4.24, 2017 09 13.
Article in English | MEDLINE | ID: mdl-28902395

ABSTRACT

BLAST, the Basic Local Alignment Search Tool, is used more frequently than any other biosequence database search program. We show how to run searches on the Web, and demonstrate how to increase performance by fine-tuning arguments for a specific research project. We offer guidance for interpreting results, statistical significance and biological relevance issues, and suggest complementary analyses. This unit covers both protein-to-protein (blastp) searches and translated searches (blastx, tblastn, tfastx). blastx conceptually translates the query sequence and tblastn translates all nucleotide sequences in a database, while tblastx translates both the query and the database sequences into amino acid sequences. © 2017 by John Wiley & Sons, Inc.


Subject(s)
Databases, Genetic , Internet , Sequence Analysis, Protein/methods , Software , Amino Acid Sequence , Proteins/chemistry , Proteins/genetics , Sequence Alignment , Sequence Homology, Amino Acid
3.
Curr Protoc Bioinformatics ; 58: 3.3.1-3.3.25, 2017 06 27.
Article in English | MEDLINE | ID: mdl-28654728

ABSTRACT

The Basic Local Alignment Search Tool (BLAST) is the first tool in the annotation of nucleotide or amino acid sequences. BLAST is a flagship of bioinformatics due to its performance and user-friendliness. Beginners and intermediate users will learn how to design and submit blastn and Megablast searches on the Web pages at the National Center for Biotechnology Information. We map nucleic acid sequences to genomes, find identical or similar mRNAs, expressed sequence tag, and noncoding RNA sequences, and run Megablast searches, which are much faster than blastn. Understanding results is assisted by taxonomy reports, genomic views, and multiple alignments. We interpret expected frequency thresholds, biological significance, and statistical significance. Weak hits provide no evidence, but indicate hints for further analyses. We find genes that may code for homologous proteins by translated BLAST. We reduce false positives by filtering out low-complexity regions. Parsed BLAST results can be integrated into analysis pipelines. Links in the output connect to Entrez and PubMed, as well as structural, sequence, interaction, and expression databases. This facilitates integration with a wide spectrum of biological knowledge. © 2017 by John Wiley & Sons, Inc.


Subject(s)
Computational Biology/methods , Sequence Analysis/methods , Software , Amino Acid Sequence , Internet
4.
Nucleic Acids Res ; 44(10): 4595-609, 2016 06 02.
Article in English | MEDLINE | ID: mdl-26823500

ABSTRACT

We present a theory of pluralistic and stochastic gene regulation. To bridge the gap between empirical studies and mathematical models, we integrate pre-existing observations with our meta-analyses of the ENCODE ChIP-Seq experiments. Earlier evidence includes fluctuations in levels, location, activity, and binding of transcription factors, variable DNA motifs, and bursts in gene expression. Stochastic regulation is also indicated by frequently subdued effects of knockout mutants of regulators, their evolutionary losses/gains and massive rewiring of regulatory sites. We report wide-spread pluralistic regulation in ≈800 000 tightly co-expressed pairs of diverse human genes. Typically, half of ≈50 observed regulators bind to both genes reproducibly, twice more than in independently expressed gene pairs. We also examine the largest set of co-expressed genes, which code for cytoplasmic ribosomal proteins. Numerous regulatory complexes are highly significant enriched in ribosomal genes compared to highly expressed non-ribosomal genes. We could not find any DNA-associated, strict sense master regulator. Despite major fluctuations in transcription factor binding, our machine learning model accurately predicted transcript levels using binding sites of 20+ regulators. Our pluralistic and stochastic theory is consistent with partially random binding patterns, redundancy, stochastic regulator binding, burst-like expression, degeneracy of binding motifs and massive regulatory rewiring during evolution.


Subject(s)
Gene Expression Regulation , Models, Genetic , Animals , Binding Sites , Cell Line , Chromatin Immunoprecipitation , DNA/metabolism , Genome, Human , Humans , Machine Learning , Mice , Ribosomal Proteins/genetics , Stochastic Processes
5.
PLoS Comput Biol ; 9(11): e1003326, 2013.
Article in English | MEDLINE | ID: mdl-24244136

ABSTRACT

Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections.


Subject(s)
Chromatin Immunoprecipitation , Computational Biology/methods , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing , Humans , Reproducibility of Results
6.
Genome Biol ; 13(5): R39, 2012 May 25.
Article in English | MEDLINE | ID: mdl-22630137

ABSTRACT

BACKGROUND: Little is known about the mechanisms of adaptation of life to the extreme environmental conditions encountered in polar regions. Here we present the genome sequence of a unicellular green alga from the division chlorophyta, Coccomyxa subellipsoidea C-169, which we will hereafter refer to as C-169. This is the first eukaryotic microorganism from a polar environment to have its genome sequenced. RESULTS: The 48.8 Mb genome contained in 20 chromosomes exhibits significant synteny conservation with the chromosomes of its relatives Chlorella variabilis and Chlamydomonas reinhardtii. The order of the genes is highly reshuffled within synteny blocks, suggesting that intra-chromosomal rearrangements were more prevalent than inter-chromosomal rearrangements. Remarkably, Zepp retrotransposons occur in clusters of nested elements with strictly one cluster per chromosome probably residing at the centromere. Several protein families overrepresented in C. subellipsoidae include proteins involved in lipid metabolism, transporters, cellulose synthases and short alcohol dehydrogenases. Conversely, C-169 lacks proteins that exist in all other sequenced chlorophytes, including components of the glycosyl phosphatidyl inositol anchoring system, pyruvate phosphate dikinase and the photosystem 1 reaction center subunit N (PsaN). CONCLUSIONS: We suggest that some of these gene losses and gains could have contributed to adaptation to low temperatures. Comparison of these genomic features with the adaptive strategies of psychrophilic microbes suggests that prokaryotes and eukaryotes followed comparable evolutionary routes to adapt to cold environments.


Subject(s)
Adaptation, Physiological , Chlorophyta/genetics , Chlorophyta/physiology , Cold Temperature , Genome , Evolution, Molecular , Genomics , Phylogeny , Synteny
7.
Plant Cell ; 24(5): 1876-93, 2012 May.
Article in English | MEDLINE | ID: mdl-22634760

ABSTRACT

We used RNA sequencing to query the Chlamydomonas reinhardtii transcriptome for regulation by CO(2) and by the transcription regulator CIA5 (CCM1). Both CO(2) and CIA5 are known to play roles in acclimation to low CO(2) and in induction of an essential CO(2)-concentrating mechanism (CCM), but less is known about their interaction and impact on the whole transcriptome. Our comparison of the transcriptome of a wild type versus a cia5 mutant strain under three different CO(2) conditions, high CO(2) (5%), low CO(2) (0.03 to 0.05%), and very low CO(2) (<0.02%), provided an entry into global changes in the gene expression patterns occurring in response to the interaction between CO(2) and CIA5. We observed a massive impact of CIA5 and CO(2) on the transcriptome, affecting almost 25% of all Chlamydomonas genes, and we discovered an array of gene clusters with distinctive expression patterns that provide insight into the regulatory interaction between CIA5 and CO(2). Several individual clusters respond primarily to either CIA5 or CO(2), providing access to genes regulated by one factor but decoupled from the other. Three distinct clusters clearly associated with CCM-related genes may represent a rich source of candidates for new CCM components, including a small cluster of genes encoding putative inorganic carbon transporters.


Subject(s)
Carbon Dioxide/pharmacology , Chlamydomonas reinhardtii/genetics , Transcriptome/genetics , Chlamydomonas reinhardtii/drug effects , Gene Expression/drug effects , Gene Expression/genetics , Gene Expression Regulation/drug effects , Gene Expression Regulation/genetics , Molecular Sequence Data , Transcriptome/drug effects
8.
Plant Cell ; 24(5): 1860-75, 2012 May.
Article in English | MEDLINE | ID: mdl-22634764

ABSTRACT

A CO(2)-concentrating mechanism (CCM) is essential for the growth of most eukaryotic algae under ambient (392 ppm) and very low (<100 ppm) CO(2) concentrations. In this study, we used replicated deep mRNA sequencing and regulatory network reconstruction to capture a remarkable scope of changes in gene expression that occurs when Chlamydomonas reinhardtii cells are shifted from high to very low levels of CO(2) (≤100 ppm). CCM induction 30 to 180 min post-CO(2) deprivation coincides with statistically significant changes in the expression of an astonishing 38% (5884) of the 15,501 nonoverlapping C. reinhardtii genes. Of these genes, 1088 genes were induced and 3828 genes were downregulated by a log(2) factor of 2. The latter indicate a global reduction in photosynthesis, protein synthesis, and energy-related biochemical pathways. The magnitude of transcriptional rearrangement and its major patterns are robust as analyzed by three different statistical methods. De novo DNA motif discovery revealed new putative binding sites for Myeloid oncogene family transcription factors potentially involved in activating low CO(2)-induced genes. The (CA)(n) repeat (9 ≤ n ≤ 25) is present in 29% of upregulated genes but almost absent from promoters of downregulated genes. These discoveries open many avenues for new research.


Subject(s)
Carbon Dioxide/metabolism , Chlamydomonas reinhardtii/metabolism , Chlamydomonas reinhardtii/genetics , Molecular Sequence Data , Plant Proteins/genetics , Plant Proteins/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism
9.
Proc Natl Acad Sci U S A ; 108(47): 19036-41, 2011 Nov 22.
Article in English | MEDLINE | ID: mdl-22065774

ABSTRACT

Negative-strand (NS) RNA viruses comprise many pathogens that cause serious diseases in humans and animals. Despite their clinical importance, little is known about the host factors required for their infection. Using vesicular stomatitis virus (VSV), a prototypic NS RNA virus in the family Rhabdoviridae, we conducted a human genome-wide siRNA screen and identified 72 host genes required for viral infection. Many of these identified genes were also required for infection by two other NS RNA viruses, the lymphocytic choriomeningitis virus of the Arenaviridae family and human parainfluenza virus type 3 of the Paramyxoviridae family. Genes affecting different stages of VSV infection, such as entry/uncoating, gene expression, and assembly/release, were identified. Depletion of the proteins of the coatomer complex I or its upstream effectors ARF1 or GBF1 led to detection of reduced levels of VSV RNA. Coatomer complex I was also required for infection of lymphocytic choriomeningitis virus and human parainfluenza virus type 3. These results highlight the evolutionarily conserved requirements for gene expression of diverse families of NS RNA viruses and demonstrate the involvement of host cell secretory pathway in the process.


Subject(s)
Host-Derived Cellular Factors/genetics , Secretory Pathway/genetics , Vesicular stomatitis Indiana virus/physiology , Virus Integration/genetics , Animals , Cell Line , Dogs , Electrophoresis, Polyacrylamide Gel , Gene Expression Profiling , Humans , Immunoblotting , Lymphocytic choriomeningitis virus/genetics , Lymphocytic choriomeningitis virus/physiology , Parainfluenza Virus 3, Human/genetics , Parainfluenza Virus 3, Human/physiology , RNA Interference , RNA, Small Interfering/genetics , Real-Time Polymerase Chain Reaction , Reverse Transcriptase Polymerase Chain Reaction , Vesicular stomatitis Indiana virus/genetics
10.
Brain ; 134(Pt 3): 732-46, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21278085

ABSTRACT

Stroke leads to brain damage with subsequent slow and incomplete recovery of lost brain functions. Enriched housing of stroke-injured rats provides multi-modal sensorimotor stimulation, which improves recovery, although the specific mechanisms involved have not been identified. In rats housed in an enriched environment for two weeks after permanent middle cerebral artery occlusion, we found increased sigma-1 receptor expression in peri-infarct areas. Treatment of rats subjected to permanent or transient middle cerebral artery occlusion with 1-(3,4-dimethoxyphenethyl)-4-(3-phenylpropyl)piperazine dihydrochloride, an agonist of the sigma-1 receptor, starting two days after injury, enhanced the recovery of lost sensorimotor function without decreasing infarct size. The sigma-1 receptor was found in the galactocerebroside enriched membrane microdomains of reactive astrocytes and in neurons. Sigma-1 receptor activation increased the levels of the synaptic protein neurabin and neurexin in membrane rafts in the peri-infarct area, while sigma-1 receptor silencing prevented sigma-1 receptor-mediated neurite outgrowth in primary cortical neuronal cultures. In astrocytic cultures, oxygen and glucose deprivation induced sigma-1 receptor expression and actin dependent membrane raft formation, the latter blocked by sigma-1 receptor small interfering RNA silencing and pharmacological inhibition. We conclude that sigma-1 receptor activation stimulates recovery after stroke by enhancing cellular transport of biomolecules required for brain repair, thereby stimulating brain plasticity. Pharmacological targeting of the sigma-1 receptor provides new opportunities for stroke treatment beyond the therapeutic window of neuroprotection.


Subject(s)
Brain/metabolism , Infarction, Middle Cerebral Artery/pathology , Infarction, Middle Cerebral Artery/physiopathology , Neuronal Plasticity/physiology , Receptors, sigma/metabolism , Recovery of Function/physiology , Animals , Astrocytes/drug effects , Brain/drug effects , Caveolin 1/genetics , Caveolin 1/metabolism , Cell Hypoxia/drug effects , Cells, Cultured , Disease Models, Animal , Dose-Response Relationship, Drug , Environment , Gene Expression Regulation/drug effects , Gene Expression Regulation/physiology , Glucose/deficiency , Infarction, Middle Cerebral Artery/drug therapy , Infarction, Middle Cerebral Artery/metabolism , Male , Movement/drug effects , Neurites/drug effects , Neurites/physiology , Neuronal Plasticity/drug effects , Neurons/cytology , Neurons/metabolism , Nootropic Agents/pharmacology , Nootropic Agents/therapeutic use , Piperazines/pharmacology , Piperazines/therapeutic use , Protein Transport/drug effects , Psychomotor Performance/drug effects , RNA, Small Interfering/pharmacology , Rats , Rats, Inbred SHR , Receptors, sigma/genetics , Recovery of Function/drug effects , Statistics, Nonparametric , Transfection/methods , Sigma-1 Receptor
11.
Physiol Genomics ; 43(3): 121-35, 2011 Feb 11.
Article in English | MEDLINE | ID: mdl-21098682

ABSTRACT

Liver-specific ablation of cytochrome P450 reductase in mice (LCN) results in hepatic steatosis that can progress to steatohepatitis characterized by inflammation and fibrosis. The specific cause of the fatty liver phenotype is poorly understood but is hypothesized to result from elevated expression of genes encoding fatty acid synthetic genes. Since expression of these genes is known to be suppressed by polyunsaturated fatty acids, we performed physiological and genomics studies to evaluate the effects of dietary linoleic and linolenic fatty acids (PUFA) or arachidonic and decosahexaenoic acids (HUFA) on the hepatic phenotypes of control and LCN mice by comparison with a diet enriched in saturated fatty acids. The dietary interventions with HUFA reduced the fatty liver phenotype in livers of LCN mice and altered the gene expression patterns in these livers to more closely resemble those of control mice. Importantly, the expression of genes encoding lipid pathway enzymes were not different between controls and LCN livers, indicating a strong influence of diet over POR genotype. These analyses highlighted the impact of POR ablation on expression of genes encoding P450 enzymes and proteins involved in stress and inflammation. We also found that livers from animals of both genotypes fed diets enriched in PUFA had gene expression patterns more closely resembling those fed diets enriched in saturated fatty acids. These results strongly suggest only HUFA supplied from an exogenous source can suppress hepatic lipogenesis.


Subject(s)
Cytochrome P-450 Enzyme System/metabolism , Dietary Fats/pharmacology , Fatty Acids/pharmacology , Fatty Liver/enzymology , Animals , Blotting, Western , Body Weight/drug effects , Cholesterol/metabolism , Dietary Fats/administration & dosage , Disease Models, Animal , Fatty Acids/administration & dosage , Fatty Liver/blood , Fatty Liver/genetics , Fatty Liver/pathology , Feeding Behavior/drug effects , Gene Expression Profiling , Gene Expression Regulation/drug effects , Genotype , Lipids/analysis , Liver/drug effects , Liver/metabolism , Liver/pathology , Male , Mice , Organ Size/drug effects , Polymerase Chain Reaction , Triglycerides/metabolism
12.
BMC Plant Biol ; 10: 238, 2010 Nov 05.
Article in English | MEDLINE | ID: mdl-21050490

ABSTRACT

BACKGROUND: The molecular mechanisms of genome reprogramming during transcriptional responses to stress are associated with specific chromatin modifications. Available data, however, describe histone modifications only at individual plant genes induced by stress. We have no knowledge of chromatin modifications taking place at genes whose transcription has been down-regulated or on the genome-wide chromatin modification patterns that occur during the plant's response to dehydration stress. RESULTS: Using chromatin immunoprecipitation and deep sequencing (ChIP-Seq) we established the whole-genome distribution patterns of histone H3 lysine 4 mono-, di-, and tri-methylation (H3K4me1, H3K4me2, and H3K4me3, respectively) in Arabidopsis thaliana during watered and dehydration stress conditions. In contrast to the relatively even distribution of H3 throughout the genome, the H3K4me1, H3K4me2, and H3K4me3 marks are predominantly located on genes. About 90% of annotated genes carry one or more of the H3K4 methylation marks. The H3K4me1 and H3K4me2 marks are more widely distributed (80% and 84%, respectively) than the H3K4me3 marks (62%), but the H3K4me2 and H3K4me1 levels changed only modestly during dehydration stress. By contrast, the H3K4me3 abundance changed robustly when transcripts levels from responding genes increased or decreased. In contrast to the prominent H3K4me3 peaks present at the 5'-ends of most transcribed genes, genes inducible by dehydration and ABA displayed atypically broader H3K4me3 distribution profiles that were present before and after the stress. CONCLUSIONS: A higher number (90%) of annotated Arabidopsis genes carry one or more types of H3K4me marks than previously reported. During the response to dehydration stress the changes in H3K4me1, H3K4me2, and H3K4me3 patterns show different dynamics and specific patterns at up-regulated, down-regulated, and unaffected genes. The different behavior of each methylation mark during the response process illustrates that they have distinct roles in the transcriptional response of implicated genes. The broad H3K4me3 distribution profiles on nucleosomes of stress-induced genes uncovered a specific chromatin pattern associated with many of the genes involved in the dehydration stress response.


Subject(s)
Arabidopsis/metabolism , Histones/metabolism , Lysine/metabolism , Stress, Physiological , Abscisic Acid/pharmacology , Arabidopsis/genetics , Chromatin Immunoprecipitation , Dehydration , Gene Expression Profiling , Genome, Plant/genetics , High-Throughput Nucleotide Sequencing , Methylation/drug effects , Plant Growth Regulators/pharmacology
13.
PLoS One ; 5(9): e12984, 2010 Sep 24.
Article in English | MEDLINE | ID: mdl-20886052

ABSTRACT

BACKGROUND: Transcription is affected by nucleosomal resistance against polymerase passage. In turn, nucleosomal resistance is determined by DNA sequence, histone chaperones and remodeling enzymes. The contributions of these factors are widely debated: one recent title claims "… DNA-encoded nucleosome organization…" while another title states that "histone-DNA interactions are not the major determinant of nucleosome positions." These opposing conclusions were drawn from similar experiments analyzed by idealized methods. We attempt to resolve this controversy to reveal nucleosomal competency for transcription. METHODOLOGY/PRINCIPAL FINDINGS: To this end, we analyzed 26 in vivo, nonlinked, and in vitro genome-wide nucleosome maps/replicates by new, rigorous methods. Individual H2A nucleosomes are reconstituted inaccurately by transcription, chaperones and remodeling enzymes. At gene centers, weakly positioned nucleosome arrays facilitate rapid histone eviction and remodeling, easing polymerase passage. Fuzzy positioning is not due to artefacts. At the regional level, transcriptional competency is strongly influenced by intrinsic histone-DNA affinities. This is confirmed by reproducing the high in vivo occupancy of translated regions and the low occupancy of intergenic regions in reconstitutions from purified DNA and histones. Regional level occupancy patterns are protected from invading histones by nucleosome excluding sequences and barrier nucleosomes at gene boundaries and within genes. CONCLUSIONS/SIGNIFICANCE: Dense arrays of weakly positioned nucleosomes appear to be necessary for transcription. Weak positioning at exons facilitates temporary remodeling, polymerase passage and hence the competency for transcription. At regional levels, the DNA sequence plays a major role in determining these features but positions of individual nucleosomes are typically modified by transcription, chaperones and enzymes. This competency is reduced at intergenic regions by sequence features, barrier nucleosomes, and proteins, preventing accessibility regulation of untargeted genes. This combination of DNA- and protein-influenced positioning regulates DNA accessibility and competence for regulatory protein binding and transcription. Interactive nucleosome displays are offered at http://chromatin.unl.edu/cgi-bin/skyline.cgi.


Subject(s)
Chromatin Assembly and Disassembly , Chromatin/metabolism , Nucleosomes/metabolism , Saccharomyces cerevisiae/genetics , Transcription, Genetic , Chromatin/genetics , Histones/genetics , Histones/metabolism , Nucleosomes/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism
14.
Methods Mol Biol ; 674: 1-22, 2010.
Article in English | MEDLINE | ID: mdl-20827582

ABSTRACT

Here we provide a pragmatic, high-level overview of the computational approaches and tools for the discovery of transcription factor binding sites. Unraveling transcription regulatory networks and their malfunctions such as cancer became feasible due to recent stellar progress in experimental techniques and computational analyses. While predictions of isolated sites still pose notorious challenges, cis-regulatory modules (clusters) of binding sites can now be identified with high accuracy. Further support comes from conserved DNA segments, co-regulation, transposable elements, nucleosomes, and three-dimensional chromosomal structures. We introduce computational tools for the analysis and interpretation of chromatin immunoprecipitation, next-generation sequencing, SELEX, and protein-binding microarray results. Because immunoprecipitation produces overly large DNA segments and well over half of the sequencing reads from constitute background noise, methods are presented for background correction, sequence read mapping, peak calling, false discovery rate estimation, and co-localization analyses. To discover short binding site motifs from extensive immunoprecipitation segments, we recommend algorithms and software based on expectation maximization and Gibbs sampling. Data integration using several databases further improves performance. Binding sites can be visualized in genomic and chromatin context using genome browsers. Binding site information, integrated with co-expression in large compendia of gene expression experiments, allows us to reveal complex transcriptional regulatory networks.


Subject(s)
Computational Biology/methods , Transcription Factors/metabolism , Animals , Binding Sites , Databases, Protein , Humans , Transcription Factors/deficiency , Transcription Factors/genetics
15.
Methods Mol Biol ; 674: 161-77, 2010.
Article in English | MEDLINE | ID: mdl-20827591

ABSTRACT

Localizing the binding sites of regulatory proteins is becoming increasingly feasible and accurate. This is due to dramatic progress not only in chromatin immunoprecipitation combined by next-generation sequencing (ChIP-seq) but also in advanced statistical analyses. A fundamental issue, however, is the alarming number of false positive predictions. This problem can be remedied by improved peak calling methods of twin peaks, one at each strand of the DNA, kernel density estimators, and false discovery rate estimations based on control libraries. Predictions are filtered by de novo motif discovery in the peak environments. These methods have been implemented in, among others, Valouev et al.'s Quantitative Enrichment of Sequence Tags (QuEST) software tool. We demonstrate the prediction of the human growth-associated binding protein (GABPalpha) based on ChIP-seq observations.


Subject(s)
Chromatin Immunoprecipitation , Sequence Analysis, DNA , Transcription Factors/metabolism , Binding Sites , False Positive Reactions , GA-Binding Protein Transcription Factor/metabolism , Humans , Internet , Jurkat Cells , Probability , Regulatory Sequences, Nucleic Acid/genetics , Reproducibility of Results , Software
16.
Curr Protoc Bioinformatics ; Chapter 3: 3.3.1-3.3.26, 2009 Jun.
Article in English | MEDLINE | ID: mdl-19496060

ABSTRACT

The Basic Local Alignment Search Tool (BLAST) is a keystone of bioinformatics due to its performance and user-friendliness. Beginner and intermediate users will learn how to design and submit blastn and Megablast searches on the Web pages at the National Center for Biotechnology Information. We map nucleic acid sequences to genomes, find identical or similar mRNA, expressed sequence tag, and noncoding RNA sequences, and run Megablast searches, which are much faster than blastn. Understanding results is assisted by taxonomy reports, genomic views, and multiple alignments. We interpret expected frequency thresholds, biological significance, and statistical significance. Weak hits provide no evidence, but hints for further analyses. We find genes that may code for homologous proteins by translated BLAST. We reduce false positives by filtering out low-complexity regions. Parsed BLAST results can be integrated into analysis pipelines. Links in the output connect to Entrez, PUBMED, structural, sequence, interaction, and expression databases. This facilitates integration with a wide spectrum of biological knowledge.


Subject(s)
Base Sequence , Computational Biology/methods , Sequence Alignment/methods , DNA/chemistry , Databases, Genetic , Information Storage and Retrieval , Software
18.
Genetics ; 179(1): 177-92, 2008 May.
Article in English | MEDLINE | ID: mdl-18493050

ABSTRACT

The availability of the complete DNA sequence of the Chlamydomonas reinhardtii genome and advanced computational biology tools has allowed elucidation and study of the small ubiquitin-like modifier (SUMO) system in this unicellular photosynthetic alga and model eukaryotic cell system. SUMO is a member of a ubiquitin-like protein superfamily that is covalently attached to target proteins as a post-translational modification to alter the localization, stability, and/or function of the target protein in response to changes in the cellular environment. Three SUMO homologs (CrSUMO96, CrSUMO97, and CrSUMO148) and three novel SUMO-related proteins (CrSUMO-like89A, CrSUMO-like89B, and CrSUMO-like90) were found by diverse gene predictions, hidden Markov models, and database search tools inferring from Homo sapiens, Saccharomyces cerevisiae, and Arabidopsis thaliana SUMOs. Among them, CrSUMO96, which can be recognized by the A. thaliana anti-SUMO1 antibody, was studied in detail. Free CrSUMO96 was purified by immunoprecipitation and identified by mass spectrometry analysis. A SUMO-conjugating enzyme (SCE) (E2, Ubc9) in C. reinhardtii was shown to be functional in an Escherichia coli-based in vivo chimeric SUMOylation system. Antibodies to CrSUMO96 recognized free and conjugated forms of CrSUMO96 in Western blot analysis of whole-cell extracts and nuclear localized SUMOylated proteins with in situ immunofluorescence. Western blot analysis showed a marked increase in SUMO conjugated proteins when the cells were subjected to environmental stresses, such as heat shock and osmotic stress. Related analyses revealed multiple potential ubiquitin genes along with two Rub1 genes and one Ufm1 gene in the C. reinhardtii genome.


Subject(s)
Chlamydomonas reinhardtii/genetics , Computational Biology/methods , Small Ubiquitin-Related Modifier Proteins/genetics , Amino Acid Sequence , Animals , Blotting, Western , DNA Primers/genetics , Databases, Genetic , Fluorescent Antibody Technique , Immunoprecipitation , Markov Chains , Mass Spectrometry , Models, Genetic , Molecular Sequence Data , Reverse Transcriptase Polymerase Chain Reaction
19.
Plant Cell ; 20(3): 568-79, 2008 Mar.
Article in English | MEDLINE | ID: mdl-18375658

ABSTRACT

Gene duplication followed by functional specialization is a potent force in the evolution of biological diversity. A comparative study of two highly conserved duplicated genes, ARABIDOPSIS TRITHORAX-LIKE PROTEIN1 (ATX1) and ATX2, revealed features of both partial redundancy and of functional divergence. Although structurally similar, their regulatory sequences have diverged, resulting in distinct temporal and spatial patterns of expression of the ATX1 and ATX2 genes. We found that ATX2 methylates only a limited fraction of nucleosomes and that ATX1 and ATX2 influence the expression of largely nonoverlapping gene sets. Even when coregulating shared targets, ATX1 and ATX2 may employ different mechanisms. Most remarkable is the divergence of their biochemical activities: both proteins methylate K4 of histone H3, but while ATX1 trimethylates it, ATX2 dimethylates it. ATX2 and ATX1 provide an example of separated K4 di from K4 trimethyltransferase activity.


Subject(s)
Arabidopsis Proteins/genetics , Genes, Duplicate , Transcription Factors/genetics , Arabidopsis Proteins/metabolism , Arabidopsis Proteins/physiology , Chromatin Immunoprecipitation , Gene Expression Regulation, Plant , Histone-Lysine N-Methyltransferase , Histones/metabolism , Methylation , Models, Genetic , Nucleosomes/metabolism , Oligonucleotide Array Sequence Analysis , Protein Isoforms/genetics , Protein Isoforms/metabolism , Protein Isoforms/physiology , Reverse Transcriptase Polymerase Chain Reaction , Transcription Factors/metabolism , Transcription Factors/physiology
20.
Nucleic Acids Res ; 35(2): 433-40, 2007.
Article in English | MEDLINE | ID: mdl-17169992

ABSTRACT

Highly accurate knockdown functional analyses based on RNA interference (RNAi) require the possible most complete hydrolysis of the targeted mRNA while avoiding the degradation of untargeted genes (off-target effects). This in turn requires significant improvements to target selection for two reasons. First, the average silencing activity of randomly selected siRNAs is as low as 62%. Second, applying more than five different siRNAs may lead to saturation of the RNA-induced silencing complex (RISC) and to the degradation of untargeted genes. Therefore, selecting a small number of highly active siRNAs is critical for maximizing knockdown and minimizing off-target effects. To satisfy these needs, a publicly available and transparent machine learning tool is presented that ranks all possible siRNAs for each targeted gene. Support vector machines (SVMs) with polynomial kernels and constrained optimization models select and utilize the most predictive effective combinations from 572 sequence, thermodynamic, accessibility and self-hairpin features over 2200 published siRNAs. This tool reaches an accuracy of 92.3% in cross-validation experiments. We fully present the underlying biophysical signature that involves free energy, accessibility and dinucleotide characteristics. We show that while complete silencing is possible at certain structured target sites, accessibility information improves the prediction of the 90% active siRNA target sites. Fast siRNA activity predictions can be performed on our web server at http://optirna.unl.edu/.


Subject(s)
Artificial Intelligence , RNA Interference , RNA, Small Interfering/chemistry , Computational Biology/methods , Internet , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...